Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Names of outputs collected from tagged jupyter notebook cells is vulnerable to collisions #7302

Closed
e-miz opened this issue Oct 19, 2023 · 2 comments
Assignees
Labels
bug Something isn't working embed triaged-to Issues that were not self-assigned, signals that an issue was assigned to someone.
Milestone

Comments

@e-miz
Copy link

e-miz commented Oct 19, 2023

Bug description

First: I'd like to celebrate since this is my first ever bug report I'm submitting to a project on github! 🥳 I cannot think of a more deserving project which I really want to see succeed.

That out of the way--I seem to have found a very frustrating bug. If I try to embed plots from two different notebooks in the same qmd file, a collision occurs if the plots come from the same cell number in the respective notebooks.

i.e. if I have two notebooks, and I embed their respective first cells in a .qmd document, their outputs will both be cell-2-output-1.png, so one will overwrite the other in the main document.

Steps to reproduce

This is what my document looks like:

---
title: Test Doc
author: Eli Mizrachi (they/them)
date: 2023-09-08
format: 
    html: default
---

{{< embed nb1.ipynb#out1 >}}

{{< embed nb2.ipynb#out1 >}}

<!--# Which ever notebook is embedded last will dominate. I.e. nb1 will dominate if listed second. -->

nb1 and nb2 are single cell notebooks with a simple plot output from matplotlib.

I have link to a reproducer repo here: https://github.com/e-miz/quarto-jup-emebed-collision

Simply run quarto render maindoc.qmd to generate the output.

Expected behavior

I would have expected that I can embed content from multiple distinct notebooks which happen to share cell numbers in the same document.

Actual behavior

Essentially what happens is because both notebooks output cell-2-output-1.png, the one embedded last overrides the other in the main document. Unfortunately, I can't figure out how to control quarto to append or prepend some kind of string to the output filenames from each notebook to prevent this from happening.

maindoc_files/
├── figure-html
│   └── cell-2-output-1.png
nb1_files/
├── figure-html
│   └── cell-2-output-1.png
nb2_files/
├── figure-html
│   └── cell-2-output-1.png

Your environment

IDE:

VSCode Version: 1.83.1 (system setup)
Commit: f1 b07bd25dfad€4b0167beb15359ae573aecd2cc
Date: (1 wk ago)
Electron: 25.8.4
ElectronBuildld: 24154031
Chromium: 114.0.5735.289
Nodejs: 18.15.0
V8: 11.4.183.29-electron.0
OS: Windows NTx64 10.0.19045

OS:

WSL2
NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"

Quarto check output

Quarto 1.4.429
[✓] Checking versions of quarto binary dependencies...
      Pandoc version 3.1.8: OK
      Dart Sass version 1.55.0: OK
      Deno version 1.33.4: OK
[✓] Checking versions of quarto dependencies......OK
[✓] Checking Quarto installation......OK
      Version: 1.4.429
      Path: /opt/quarto/bin

[✓] Checking tools....................OK
      TinyTeX: v2023.10
      Chromium: 869685

[✓] Checking LaTeX....................OK
      Using: TinyTex
      Path: /home/emiz/.TinyTeX/bin/x86_64-linux
      Version: 2023

[✓] Checking basic markdown render....OK

[✓] Checking Python 3 installation....OK
      Version: 3.11.5 (Conda)
      Path: /home/emiz/miniconda3/envs/sv_SEDecayTime/bin/python
      Jupyter: 5.4.0
      Kernels: python3

(-) Checking Jupyter engine render....0.00s - Debugger warning: It seems that frozen modules are being used, which may
0.00s - make the debugger miss breakpoints. Please pass -Xfrozen_modules=off
0.00s - to python to disable frozen modules.
0.00s - Note: Debugging will proceed. Set PYDEVD_DISABLE_FILE_VALIDATION=1 to disable this validation.
(|) Checking Jupyter engine render....0.00s - Debugger warning: It seems that frozen modules are being used, which may
0.00s - make the debugger miss breakpoints. Please pass -Xfrozen_modules=off
0.00s - to python to disable frozen modules.
0.00s - Note: Debugging will proceed. Set PYDEVD_DISABLE_FILE_VALIDATION=1 to disable this validation.
[✓] Checking Jupyter engine render....OK

[✓] Checking R installation...........(None)

      Unable to locate an installed version of R.
      Install R from https://cloud.r-project.org/
@e-miz e-miz added the bug Something isn't working label Oct 19, 2023
@mcanouil mcanouil added the triaged-to Issues that were not self-assigned, signals that an issue was assigned to someone. label Oct 19, 2023
@cderv cderv changed the title Naming shceme of outputs collected from jupyter notebooks is vulnerable to collisions Naming scheme of outputs collected from jupyter notebooks is vulnerable to collisions Oct 20, 2023
@e-miz
Copy link
Author

e-miz commented Nov 5, 2023

I was able to play around with this a bit more using Quarto 1.4.480 and found that the cell output file name is changed when the jupyter cell uses #| label: instead of a cell tag in the JSON cell metadata.

e.g.

#| label: out1

import matplotlib.pyplot as plt

fig, ax = plt.subplots()
plt.plot([1, 23, 2, 4])
ax.set(title="Notebook2")
plt.show()

results in out1-output-1.png instead of cell-2-output-1.png

This is much more in line with what I expect should happen. I suspect it works in some earlier versions as well, however
I was stuck on 1.4.63 which seemed to have other issues with embedding from multiple notebooks.

I've since upgraded to 1.4.480 and I'm using labels in the cells instead of cell tags--which I kind of hated anyways.

Something slightly annoying about this though is that I get a warning when I render about broken
crossref categories. The new crossrefs are definitely a cool feature to have, but for now I've commented out the warning in main.lua because it's impossible to see other warnings which are more important to me in my main project.

I'm on the fence about closing this since the functionality seems like it should work for outputs from cells with tags, but overall this is a pretty reasonable workaround.

@e-miz e-miz changed the title Naming scheme of outputs collected from jupyter notebooks is vulnerable to collisions Names of outputs collected from tagged jupyter notebook cells is vulnerable to collisions Nov 5, 2023
@chenseanxy
Copy link

chenseanxy commented Nov 15, 2023

I was able to play around with this a bit more using Quarto 1.4.480 and found that the cell output file name is changed when the jupyter cell uses #| label: instead of a cell tag in the JSON cell metadata.

e.g.

#| label: out1

import matplotlib.pyplot as plt

fig, ax = plt.subplots()
plt.plot([1, 23, 2, 4])
ax.set(title="Notebook2")
plt.show()

results in out1-output-1.png instead of cell-2-output-1.png

This is much more in line with what I expect should happen. I suspect it works in some earlier versions as well, however I was stuck on 1.4.63 which seemed to have other issues with embedding from multiple notebooks.

I've since upgraded to 1.4.480 and I'm using labels in the cells instead of cell tags--which I kind of hated anyways.

This can cause problems too, because when these labels are propagated to the main document, collision wasn't checked either.

I think the root of the issue is, syntax of notebook_name.ipynb#id implies the references are namespaced by notebook names when they're not, so maybe behaviour should match that.

Update: after digging around the codebase a bit, I was able to find this which seemed to handle jupyter image files:

const imageFile = options.assets.figures_dir +

@dragonstyle dragonstyle added this to the v1.4 milestone Nov 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working embed triaged-to Issues that were not self-assigned, signals that an issue was assigned to someone.
Projects
None yet
Development

No branches or pull requests

4 participants