Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

empty lines at end of certain files cause parse to fail #31

Closed
poldrack opened this issue Aug 5, 2020 · 10 comments · Fixed by #36
Closed

empty lines at end of certain files cause parse to fail #31

poldrack opened this issue Aug 5, 2020 · 10 comments · Fixed by #36
Labels
bug Something isn't working

Comments

@poldrack
Copy link

poldrack commented Aug 5, 2020

Describe the bug

The presence of three empty lines at the end of a particular file causes the build to break.

To Reproduce

Steps to reproduce the behavior:

  1. Clone https://github.com/poldrack/psych-open-science-guide
  2. "jb build guide" should work properly
  3. Add two additional line feeds to the end of guide/4_reproducibleanalysis.md
  4. "jb build guide" should now fail with an error.

Expected behavior

When the extra lines are added, the follow Exception occurs:

Environment

  • Python 3.8.3

  • output of jupyter-book --version:
    Jupyter Book: 0.7.3
    MyST-NB: 0.8.4
    Sphinx Book Theme: 0.0.33
    MyST-Parser: 0.9.0
    Jupyter-Cache: 0.2.2

  • Operating System: Mac OS X

@choldgraf
Copy link
Member

I think you didn't paste in the exception :-)

@poldrack
Copy link
Author

poldrack commented Aug 5, 2020

odd, thought I had, here it is:

Exception occurred:
File "/Users/poldrack/anaconda3/envs/py38/lib/python3.8/site-packages/markdown_it/rules_block/list.py", line 285, in list_block
contentStart = state.bMarks[startLine]
IndexError: list index out of range

@choldgraf
Copy link
Member

choldgraf commented Aug 5, 2020

Interesting - I wonder if the empty lines are being treated as a special block by the markdown parser, then failing because they're empty?

ping @chrisjsewell since this seemes like something in the bowels of markdown-it-py. I think we could probably fix this in jupyter book by stripping the end of the page, but maybe it's a bug that should be fixed deeper?

@firasm
Copy link
Contributor

firasm commented Aug 14, 2020

I also have had this empty-line issue in some of my notebooks, I now know how to debug it so I've just been fixing it where it's been an issue.

If another reproducible example is needed, I could probably create one if it'll help, but I think with the example above, there already be enough info?

@chrisjsewell
Copy link
Member

moving this to markdown-it-py

@chrisjsewell chrisjsewell transferred this issue from executablebooks/jupyter-book Aug 14, 2020
@chrisjsewell chrisjsewell added the bug Something isn't working label Aug 14, 2020
@executablebooks executablebooks deleted a comment from welcome bot Aug 14, 2020
@chrisjsewell
Copy link
Member

could some one copy/link here a mininimal example Markdown file where this occurs thanks

@chrisjsewell chrisjsewell added this to To do in Chris S's TODO list via automation Aug 14, 2020
@chrisjsewell chrisjsewell changed the title empty lines at end of certain files cause build to fail empty lines at end of certain files cause parse to fail Aug 14, 2020
@choldgraf
Copy link
Member

Ping @firasm and @poldrack in case they don't see Chris request above!

@firasm
Copy link
Contributor

firasm commented Aug 14, 2020

oops - didn't get notified of the above! Thanks @choldgraf

@chrisjsewell: Here's an example. There are only two commits in this repo, the first commit without the two blank lines (works fine), and the second commit with the two blank lines (build fails).

▶ jb build .
Running Sphinx v2.4.4
loading pickled environment... done
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 1 source files that are out of date
updating environment: 0 added, 1 changed, 0 removed
reading sources... [100%] markdown                                              
Exception occurred:
  File "/Users/firasm/.pyenv/versions/3.8.3/lib/python3.8/site-packages/markdown_it/rules_block/list.py", line 285, in list_block
    contentStart = state.bMarks[startLine]
IndexError: list index out of range
The full traceback has been saved in /var/folders/64/bfv2dn992m17r4ztvfrt93rh0000gn/T/sphinx-err-k2gw1der.log, if you want to report the issue to the developers.
Please also report this if it was a user error, so that a better error message can be provided next time.
A bug report can be filed in the tracker at <https://github.com/sphinx-doc/sphinx/issues>. Thanks!
Traceback (most recent call last):
  File "/Users/firasm/.pyenv/versions/3.8.3/bin/jb", line 8, in <module>
    sys.exit(main())
  File "/Users/firasm/.pyenv/versions/3.8.3/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/firasm/.pyenv/versions/3.8.3/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/firasm/.pyenv/versions/3.8.3/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/firasm/.pyenv/versions/3.8.3/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/firasm/.pyenv/versions/3.8.3/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/firasm/.pyenv/versions/3.8.3/lib/python3.8/site-packages/jupyter_book/commands/__init__.py", line 140, in build
    _error(
  File "/Users/firasm/.pyenv/versions/3.8.3/lib/python3.8/site-packages/jupyter_book/utils.py", line 65, in _error
    raise kind(box)
ValueError: 
===============================================================================

There was an error in building your book. Look above for the error message.

===============================================================================

P.S. @poldrack I created a template jupyterbook, and replaced the content of markdown.md with your file: guide/4_reproducibleanalysis.md just to reproduce the bug (it was easier than tracking down which one of my files was showing this behaviour), I'll delete the repo once this issue is resolved.

@firasm
Copy link
Contributor

firasm commented Aug 14, 2020

For what it's worth, I couldn't reproduce the issue with any old md file by adding two blank lines, only certain files.

@sildar
Copy link
Contributor

sildar commented Aug 17, 2020

Hi,

I took a quick look at this.

When processing lists, there is a call to tokenize() that advances the state.line attribute :

list.py 
line 264: state.md.block.tokenize(state, startLine, endLine)

After this line is executed, state.line can be larger than endLine, leading to the IndexError when checking the state at that index.

Breaking right after this line if state.line > endLine doesn't work though, we have to update the nextline variable before, as well as closing the list. It's actually already in the codebase, but one line too late:

line 280 onwards:
    token = state.push("list_item_close", "li", -1)
    token.markup = chr(markerCharCode)

    nextLine = startLine = state.line
    itemLines[1] = nextLine
    contentStart = state.bMarks[startLine]

    if nextLine >= endLine:
        break

The easy solution would be to just put the check a few lines earlier.

line 280 onwards:
    token = state.push("list_item_close", "li", -1)
    token.markup = chr(markerCharCode)

    nextLine = startLine = state.line  # we actually need to update these before breaking

    if nextLine >= endLine:
        break

    itemLines[1] = nextLine
    contentStart = state.bMarks[startLine]

Also, I'm not sure about what itemLines does, it's not really used at all in this method.

I can't run the test suite right now to check that this solution doesn't break anything else. But a diff on spec.md shows no difference.

If no one else writes the fix (feel free to do it), I'll do it this evening or tomorrow.

Edit: also, maybe there is a more elegant solution to implement in the tokenize() method.
Edit 2: I improved my reply to include more information/clarify

sildar added a commit to sildar/markdown-it-py that referenced this issue Aug 17, 2020
Chris S's TODO list automation moved this from To do to Done Aug 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Development

Successfully merging a pull request may close this issue.

5 participants