Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot clean notebooks encountering “NotJSONError” with plotly js code inside #273

Closed
firezym opened this issue Apr 23, 2024 · 4 comments

Comments

@firezym
Copy link

firezym commented Apr 23, 2024

@srstevenson Thanks for this awesome repo. I am having some trouble cleaning notebooks with html/js inside. Below is the detailed error. Please kindly check it out :)

System :

Windows Server 2022 Datacenter 21H2 20348.2402

Core Packages :

jupyterlab >= 4.0.10
nbformat 5.9.2
nb-clean 3.2.0
plotly 5.18.0

Core Commands :

nb-clean add-filter
git add plotly-example-2.ipynb

It works well on notebooks without plotly.
But getting error from this notebook with plotly's html js snippets in it. plotly-example-2.zip

Error :

I checked the json format. It happens on line 29 which is the beginning of a chunk of js snippet having confusing "" in it.

(dev) PS D:\Dapu\prod> git add plotly-example-2.ipynb
Traceback (most recent call last):
  File "D:\ProgramData\miniconda3\envs\dev\Lib\site-packages\nbformat\reader.py", line 19, in parse_json
    nb_dict = json.loads(s, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ProgramData\miniconda3\envs\dev\Lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ProgramData\miniconda3\envs\dev\Lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ProgramData\miniconda3\envs\dev\Lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 29 column 301224 (char 302194)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "D:\ProgramData\miniconda3\envs\dev\Scripts\nb-clean.exe\__main__.py", line 7, in <module>
  File "D:\ProgramData\miniconda3\envs\dev\Lib\site-packages\nb_clean\cli.py", line 298, in main
    args.func(args)
  File "D:\ProgramData\miniconda3\envs\dev\Lib\site-packages\nb_clean\cli.py", line 150, in clean
    notebook = nbformat.read(input_, as_version=nbformat.NO_CONVERT)  # type: ignore[no-untyped-call]
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ProgramData\miniconda3\envs\dev\Lib\site-packages\nbformat\__init__.py", line 174, in read
    return reads(buf, as_version, capture_validation_error, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ProgramData\miniconda3\envs\dev\Lib\site-packages\nbformat\__init__.py", line 92, in reads
    nb = reader.reads(s, **kwargs)
         ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ProgramData\miniconda3\envs\dev\Lib\site-packages\nbformat\reader.py", line 75, in reads
    nb_dict = parse_json(s, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ProgramData\miniconda3\envs\dev\Lib\site-packages\nbformat\reader.py", line 25, in parse_json
    raise NotJSONError(message) from e
nbformat.reader.NotJSONError: Notebook does not appear to be JSON: '{\n "cells": [\n  {\n   "cell_type": "c...
error: external filter 'nb-clean clean' failed 1
error: external filter 'nb-clean clean' failed
warning: in the working copy of 'plotly-example-2.ipynb', LF will be replaced by CRLF the next time Git touches it

Can not reproduce using nbformat directly in python:

When I use nbformat to load, such error will not happen. It seems fine to get the whole html content in notebook['cells'][0]['outputs'][0]['data']['text/html'].

import nbformat
filename = "plotly-example-2.ipynb"
with open(filename, 'r', encoding='utf-8') as f:
    notebook = nbformat.read(f, as_version=nbformat.NO_CONVERT)

notebook['cells'][0]['outputs'][0]['data']['text/html']
@srstevenson
Copy link
Owner

I'm not able to reproduce this using the same versions of nb-clean and nbformat, either using the Git filter or invoking nb-clean manually:

$ nb-clean check plotly-example-2.ipynb
plotly-example-2.ipynb cell 0: metadata
plotly-example-2.ipynb cell 0: execution count
plotly-example-2.ipynb cell 0: outputs
plotly-example-2.ipynb metadata: language_info.version

However, I'm on Linux whereas you're on Windows and there's a warning from Git that LF line endings will be replaced with CRLF line endings on checkout in your output. To see if the line ending conversion is involved, do you have the same error if you run nb-clean outside the Git filter (nb-clean check plotly-example-2.ipynb)?

@firezym
Copy link
Author

firezym commented Apr 25, 2024

I can pass the $ nb-clean check plotly-example-2.ipynb on windows powershell command line too, returning the same results as you.

But when I use $ git add plotly-example-2.ipynb, I still get the same error showing above.

My CRLF setting in the git global config file C:\Users\Administrator\.gitconfig is as following

[core]
	autocrlf = input

Should I alter the autocrlf setting to something else?

@srstevenson
Copy link
Owner

According to this PR in another project, Jupyter notebooks are always created with LF line endings on Windows. That suggests adding the following to the .gitattributes file in your repository (if you've not worked with the .gitattributes file before, there's documentation on its purpose and the available options here):

*.ipynb  text eol=lf

@srstevenson
Copy link
Owner

I'll assume configuring .gitattributes worked: if you have any other trouble please open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants