Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paper: Codebraid: Live Code in Pandoc Markdown #469

Open
wants to merge 25 commits into
base: 2019
from

Conversation

Projects
None yet
9 participants
@gpoore
Copy link
Contributor

commented May 22, 2019

I will be doing additional proofreading over the next few days.

I am also working on adding some additional features to the software. Depending on how that goes, I may try to add a few additional paragraphs later this week.

@stefanv

This comment has been minimized.

Copy link
Member

commented Jun 5, 2019

👋 Hi @gpoore, I will be reviewing your paper.

Before I read through the text in detail, I wonder if you've run across any of the following (I don't see them mentioned in the text):

Their authors may also be good reviewers for this paper: @mwouts @aaren @Carreau @matthew-brett @choldgraf

@gpoore

This comment has been minimized.

Copy link
Contributor Author

commented Jun 5, 2019

@stefanv I believe I have run across all of those. I'd appreciate input on how much detail to add about markdown-notebook programs, since there are a lot of them (more listed below!) and I'm still debating the best way to cover them. Currently, the introduction is focusing on Jupyter notebooks, knitr, and Org Babel, with a few mentions of additional programs and extensions. I could add some brief mentions of additional markdown-notebook programs. An alternative would be to restructure things somewhat and include a more in-depth comparison. The paper is a little more than 2 pages under maximum length, so there's easily space to add another page or so of detailed comparison if that's useful. Either way, I could probably rework the introduction/comparisons within a day or so.

I need to add discussion of nbconvert and cite it explicitly. Just as I was making the last revisions, I learned about global content filtering and tag-based filtering of cells/output. So parts of the introduction aren't up-to-date on nbconvert capabilities.

There actually is a brief citation of Jupyter Book ([LH19])...it might be good to mention Jupyter Book explicitly in the text, and this part will need to be revised in any case since it relates to nbconvert.

In terms of Jupytext, notedown, and nb2plots: I do need to add more about other markdown-notebook programs. In addition to those, I'm also aware of these (plus Pweave and Weave.jl that are already mentioned in the paper):

Also, while I've been working on this project over the last several months, Pandoc has added ipynb as a supported format (https://pandoc.org/demos.html, example 15). While I currently mention that in the conclusion, I should also add some comparison with it earlier.

@matthew-brett

This comment has been minimized.

Copy link

commented Jun 5, 2019

I'm happy to help with review.

@stefanv

This comment has been minimized.

Copy link
Member

commented Jun 5, 2019

Tip of the hat for mentioning org-mode. And, yes, pandoc is an obvious one I left out. I think a review sections that outlines the landscape of these tools would be very helpful for readers and potential users. It would also give an even stronger motivation for your work.

I'll set myself a reminder to do another review here on Monday; does that seem like a good timeline?

@matthew-brett Thank you for agreeing to join in the review!

@gpoore

This comment has been minimized.

Copy link
Contributor Author

commented Jun 5, 2019

@stefanv That sounds good. I'll rework the introduction and add an in-depth review of these additional tools, and have that ready for Monday.

@matthew-brett Thanks for reviewing!

@mwouts

This comment has been minimized.

Copy link

commented Jun 5, 2019

Hello, I am not sure I can do an in-depth review, but I am happy to give a few comments...
First, thanks @gpoore for working on this, and thanks @stefanv for reaching out. This is a very interesting project.

Obviously I like the fact that it works with plain Markdown files! Also, I appreciate the possibility to write code over multiple cells.

I do not know how many people actually mix languages in a single document, but I would say anyway that this is already possible with R and Python both in R Markdown (with reticulate) and in Python jupyter notebooks (with the %%R magic cells).

Regarding outputs, I would say that support for rich outputs (images and javascript) is very important, is this available in codebraid ?

Finally, @gpoore you do mention that you are interested in converting codebraid documents back and forth to Jupyter notebooks. Clearly this will is a great idea - please get in touch if you want that we work together on the subject.

@gpoore

This comment has been minimized.

Copy link
Contributor Author

commented Jun 6, 2019

@mwouts Thanks for the comments!

There isn't yet support for rich outputs, but that is planned. The lightweight, built-in code execution system can be extended to support some rich outputs. I expect to have Jupyter kernels working as an alternate execution system very soon, so that will automatically add a lot of rich output capabilities.

I don't know when I will get around to converting between Codebraid documents and Jupyter notebooks, but I would definitely be interested in working on that together when the time comes.

@choldgraf

This comment has been minimized.

Copy link
Contributor

commented Jun 6, 2019

hey all - what kind of timeline are we looking at for a review? I'm gonna be slammed until at least the end of next week, so wouldn't be able to look at anything before then.

@matthew-brett

This comment has been minimized.

Copy link

commented Jun 7, 2019

I was going to say the same thing as @mwouts - that there is a only a small subset of literate-programming use-cases for scientific Python, that don't need plots, nice table displays and so on.

To add to this crowded market - more Sphinx tools:

@deniederhut

This comment has been minimized.

Copy link
Member

commented Jun 7, 2019

@choldgraf initial reviews are due June 11, but the open review period extends for another couple of weeks. If @gpoore is okay with getting some comments after the deadline, then it's also fine by me.

In case it's helpful, you can see our review criteria listed here - https://github.com/scipy-conference/scipy_proceedings/blob/2019/review_criteria.md

@gpoore

This comment has been minimized.

Copy link
Contributor Author

commented Jun 7, 2019

@choldgraf @deniederhut I'm happy to receive additional comments after the initial reviews are due.

gpoore added some commits Jun 10, 2019

added section on new support for jupyter kernels, moved and reorganiz…
…ed implementation and debugging sections to account for this; added mention of jupyter kernels elsewhere as needed; added paragraph about importance of pandoc ast
@gpoore

This comment has been minimized.

Copy link
Contributor Author

commented Jun 10, 2019

I uploaded an updated paper over the weekend, and just added an additional proofread.

  • The paper now begins with an in-depth review of additional tools that allow mixing text with executable code, with special focus on programs related to Markdown.

  • I now have Jupyter kernels working as an alternative to the built-in code execution system. I added a short example of this. I also merged and slightly condensed the old code execution and debugging sections, and moved them earlier in the paper so that this works better with the kernel example. A new Codebraid release with kernel support will be on GitHub and then PyPI shortly.

@stefanv

This comment has been minimized.

Copy link
Member

commented Jun 11, 2019

execution. The template writes delimiters to stdout and stderr at the
beginning of each code chunk. These delimiters are based on a hash of
the code to avoid the potential for collisions. Once execution is
complete, Codebraid parses stdout and stderr and uses these delimiters

This comment has been minimized.

Copy link
@MSeal

MSeal Jun 12, 2019

I am guessing this assumes single execution thread and order preserved stdout/stderr? Not unreasonable, but was just curious.

This comment has been minimized.

Copy link
@gpoore

gpoore Jun 14, 2019

Author Contributor

Yes, those are essentially the assumptions. Technically, multiple threads are possible, as well as async or other things that wouldn't preserve stdout/stderr order, as long as these are completely confined to a single code block, or to a sequence of code blocks that are marked as incomplete that are followed by a complete block.

@MSeal

This comment has been minimized.

Copy link

commented Jun 12, 2019

Glad I got cc'd to see this write-up. It gave me a much better idea about codebraid, and it appears to be a much closer parallel to R-Markdown than other options out there for users who appreciated that pattern.

title = {Jupyter notebooks as {Markdown} documents, {Julia}, {Python} or {R} scripts},
year = {2018--2019},
url = {https://jupytext.readthedocs.io/},
}

This comment has been minimized.

Copy link
@mwouts

mwouts Jun 12, 2019

Jupytext is now a team work indeed (with 25 contributors already!). Yet I think in a bibliography I'd like to be mentioned among the authors. Or maybe, @gpoore , you could add an entry for the original announcement for Jupytext, at https://towardsdatascience.com/introducing-jupytext-9234fdff6c57 ?

This comment has been minimized.

Copy link
@gpoore

gpoore Jun 14, 2019

Author Contributor

That's a good point. I'll add you as an author, and probably add an entry for the original announcement as well.

@stefanv
Copy link
Member

left a comment

The paper currently heavily focuses on technical details, which I think is fine for the "body" of the paper. But it would be good to think a bit more about the story that the paper should tell. "Hey, here's my new document generator; I call it codebraid. I wrote it because I wanted to do X. There are other packages that do some parts of X, but codebraid is different in that it does A, B, and C. Let me show you how some of these features work: here's an example of A, one of B, and one of C. Let's look at how codebraid compares to those other packages (more in Appendix A). Conclusion: That was codebraid, and here is why you should use it."

Finally, don't forget to add links to the source repo! (I might have missed them.)

@@ -0,0 +1,2 @@
codebraid pandoc -f markdown -t rst --overwrite -o geoffrey_poore.rst poore.txt

This comment has been minimized.

Copy link
@stefanv

stefanv Jun 12, 2019

Member

Nice :D 👍 🚀

This comment has been minimized.

Copy link
@deniederhut

deniederhut Jun 13, 2019

Member

Using the tool to talk about the tool 👏👏👏


Codebraid executes designated inline code and code blocks in Pandoc
Markdown documents as part of the document build process. It includes a
lightweight, low-overhead code execution system. Alternatively, code can

This comment has been minimized.

Copy link
@stefanv

stefanv Jun 12, 2019

Member

I imagine code execution is not something I'd expect to have large overhead. I will read further to see if there's anything special being done in this regard.

This comment has been minimized.

Copy link
@gpoore

gpoore Jun 14, 2019

Author Contributor

The point I was trying to make is that the system that is executing the code has little effect on the overall runtime. In the Jupyter kernel case, starting the kernel can have a noticeable impact on overall runtime if the code itself doesn't take long to run. I'll probably remove "low-overhead" here and then explicitly explain this in the revised introduction.

combination of its original Markdown source, its code, the stdout or stderr
resulting from execution, or rich output in the case of Jupyter kernels.
There is also support for programmatically copying code or output to other
parts of a document.

This comment has been minimized.

Copy link
@stefanv

stefanv Jun 12, 2019

Member

It is self-evident for those of us who know Markdown, but it may be worth mentioning that output types include PDF, HTML, EPUB, etc.

`copy=code1+code2`. It would also be possible to run the code elsewhere:

``````
```{.cb.run copy=code1+code2}

This comment has been minimized.

Copy link
@stefanv

stefanv Jun 12, 2019

Member

Ah, ok, looks like I should rather review the .rst file---heading there.




Review

This comment has been minimized.

Copy link
@stefanv

stefanv Jun 12, 2019

Member

I agree with @mwcraig's assessment. Sorry if I pushed you even further in this direction. While I think it is important to measure the "competition" and how codebraid is different, there is too much here and it needs to be in a different place in the paper (we first want to hear about codebraid!). A table with feature comparison is a great idea. These details can then move out to an appendix.


..
Pandoc Markdown defines attributes for inline code and code blocks.

This comment has been minimized.

Copy link
@stefanv

stefanv Jun 12, 2019

Member

This is the first introduction to codebraid, but here we are digging into Pandoc semantics instead of getting a big picture overview of the project, its goals, features, etc.

gpoore added some commits Jun 17, 2019

converted review into shorter comparison plus appendix (includes Jupy…
…text reference changes suggested by @mwouts); condensed abstract; new intro with examples
removed discussion of Pandoc AST from example section (duplicates kni…
…tr comparison); moved Pandoc attribute discussion from example section into its own section immediately afterward
improved Jupyter kernel section: clarified multiple kernels in respon…
…se to @mwcraig, split example into two code chunks to illustrate features better
addressed most remaining points from @mwcraig: clarified raw display,…
… clarified multiple languages with Jupyter kernels, mentioned session name restrictions, clarified "template" with outside_main, split paragraph under including external files, revised conclusion to emphasize Codebraid features; removed unneeded cb.nb example under commands to fix page length
@gpoore

This comment has been minimized.

Copy link
Contributor Author

commented Jun 17, 2019

There is now a new introduction that provides examples at the very beginning. This is followed by just over a page of comparisons with knitr, Pweave, Org-mode Babel, and Jupyter Notebook, focusing on default features. A summary of extensions and other, similar programs is now in an appendix at the end (a little under 1 page).

I split the discussion of Pandoc attribute syntax into its own section, so it doesn't get in the way of the example of building a simple document. Thanks to @stefanv for pointing out that issue. I also revised the conclusion to emphasize features in response to @mwcraig, and have addressed essentially all of @mwcraig's other comments.

This is currently right at 8 pages, which does not leave room for a comparison table. If the current page of comparison would benefit from the addition of a table, or if there are any points where an additional example would be particularly helpful, I can free up space by dropping the subsection on outside_main and/or condensing the section on the built-in code execution system.

@choldgraf

This comment has been minimized.

Copy link
Contributor

commented Jun 17, 2019

hey all - just a note that since @MSeal and @mwcraig and maybe @Carreau are reviewing this, I'll likely hold off on reviewing and will instead read through their own thoughts because this is a paper of interest to me! I'm particularly curious if / how functionality here could be woven in with Jupyter Book (or if the models these two projects follow is different enough that it doesn't make sense).

As folks mentioned RMarkdown - at some point it would be great if the Jupyter community could agree on a standard for a more high-powered flavor of markdown that we can start adopting across projects (maybe the answer is RMarkdown or pandoc markdown?). That's out of the scope of this project submission, but it's a point I think worth tackling as a community!

@deniederhut

This comment has been minimized.

Copy link
Member

commented Jun 18, 2019

Whoa this has been a great group of reviews! Thanks everyone 😄

can be replaced by a display of any combination of its original Markdown
source, its code, the stdout or stderr resulting from execution, or rich
output in the case of Jupyter kernels. There is also support for
programmatically copying code or output to other parts of a document.

This comment has been minimized.

Copy link
@stefanv

stefanv Jun 18, 2019

Member

This is a very good summary.

@stefanv
Copy link
Member

left a comment

The paper is shaping up very nicely, Geoffrey! I made one structural suggestion still: to finish talking about codebraid and its features, before moving on to comparison. But I think the edits massively improve the readability and "punch" of the paper.

programmatically, so code can be executed at one location in a document and
its output displayed elsewhere.

Building a simple Codebraid document

This comment has been minimized.

Copy link
@stefanv

stefanv Jun 18, 2019

Member

I would move anything related to codebraid to before the comparison.

This comment has been minimized.

Copy link
@gpoore

gpoore Jun 19, 2019

Author Contributor

That does make things flow better. I've changed that, and added a paragraph at the end of the introduction that summarizes where the paper is going.

gpoore added some commits Jun 19, 2019

moved comparison to end per suggestion by @stefanv; added transition …
…paragraph at end of intro; fixed PyPI acronym in intro
@mwcraig

This comment has been minimized.

Copy link

commented Jun 19, 2019

I'll have a chance to take another look tonight -- looking forward to it based on all the comments!

@mwcraig
Copy link

left a comment

This is really excellent, @gpoore! It flows much more smoothly and makes a great case for the advantages of codebraid. There are a couple of minor readability issues and a suggestion to include an example in which .cb.nb is used without a kernel or in which .cb.run is used with a kernel to demonstrate that .cb.nb is really a shorthand for "run and display using a notebook-like formatwhilejupyter_kernel` means "run in a kernel, producing rich output and display however the rest of the codebraid commands indiciate the output should be displayed."

I can see that .cb.nb jupyter_kernel=python3 will be a common use case, but codebraid's ability to produce notebook-like display without a notebook or to include rich output while by default hiding the code is really a plus.

Once again, nice job on this!

kernels ``:cite:`Kluyver2016` ``{=rst} to execute code. The first code block
that is executed with a given language can specify a kernel. In the example
below, the "``.cb.nb``" tells Codebraid to run the code and provide a
"notebook" display that shows both code and output.

This comment has been minimized.

Copy link
@mwcraig

mwcraig Jun 21, 2019

I think it would be very useful here to emphasize that there are two separate directives here.

.cb.nb is essentially a shorthand for a particular set of display choices intended to reproduce the input/output pattern of a notebook.

jupyter_kernel=python3 opts in to running the code with a Jupyter kernel the primary benefit of which (in this case) is getting the rich output you would see in a notebook.

While the two are often used together (and I think are used together in all of the examples of .cb.nb in the text, they could be used separately.

I think you could address this by saying explicitly that the jupyter_kernel bit indicates running the code in the kernel and generating rich output. You already come quite close to saying this, but I think erring on the side of explicit over implicit would be helpful.

```

Using Codebraid to execute code as part of the document conversion process is
as simple as replacing `pandoc` with `codebraid pandoc`:

This comment has been minimized.

Copy link
@mwcraig

mwcraig Jun 21, 2019

There is a really unfortunately line break in the PDF between codebraid and pandoc. Forcing those two to be on the same line would be very nice.

not be ideal from some perspectives, it is the cost of maintaining full
compatibility with Pandoc Markdown syntax.

Finally, the `cb.nb` command runs code in "notebook mode." For inline code,

This comment has been minimized.

Copy link
@mwcraig

mwcraig Jun 21, 2019

I think runs should change to displays here, unless I am misunderstanding what .cb.nb does. My understanding is that it displays both the input and the output, and could be used without a kernel as in:

```{.python .cb.nb}
print("Hello like a notebook!")

In a similar vein, I gather one could do something like this to run in a kernel but display only the rich output:

```{.python .cb.run jupyter_kernel=Python3}
%matplotlib inline
from matplotlib import pyplot as plt
plt.plot([0, 1, 4, 9, 16])

This comment has been minimized.

Copy link
@gpoore

gpoore Jun 22, 2019

Author Contributor

I've reworded this to involve both runs and displays. Having displays does help, and I want to keep runs to emphasize that the code is executed (since it isn't always, as with cb.code).

These can be used to override the default display settings for each command.

`show` takes any combination of the following options: `markup` (display
Markdown source), `code`, `stdout`, `stderr`, and `none`. There is also

This comment has been minimized.

Copy link
@mwcraig

mwcraig Jun 21, 2019

I know you said this earlier, but could you mention again here what code does, like you did for markup?

code. Multiple options can be combined, such as `show=code+stdout+stderr`.
Code chunks using `copy` can employ `copied_markup` to display the
Markdown source of the copied code chunk. When the `cb.expr` command is used,
the expression output is available via `expr`. `show` completely overwrites

This comment has been minimized.

Copy link
@mwcraig

mwcraig Jun 21, 2019

Unfortunately, to the casual reader this bit

is available via `expr`. `show` completely overwrites

reads as "is available via expr.show completely overrides" which is confusing. How about changing the last sentence to something like:

Using show completely overwrites...

This comment has been minimized.

Copy link
@gpoore

gpoore Jun 22, 2019

Author Contributor

That's a very good point! I hadn't thought about this inadvertently resembling some sort of attribute syntax.

the expression output is available via `expr`. `show` completely overwrites
the existing display settings.

The display format can also be specified with `show`. `stdout`, `stderr`, and

This comment has been minimized.

Copy link
@mwcraig

mwcraig Jun 21, 2019

Similar to the previous comment, this read to me as "The display format can also be specified with show.stdout, stderr" and was coupled with a line break right after stdout. The combination took me several tries to parse correctly.

Suggestion:

The display format of `stdout`, `stderr`, and `expr` can also be specified with `show`. The formats are `raw` (interpreted as Markdown)...
The Bash configuration file also specifies that the file extension `.sh`
should be used, and provides another four lines of template code to enable
`cb.expr`. So far, the longest configuration file, for Rust, is less than
fifty lines—counting empty lines.

This comment has been minimized.

Copy link
@mwcraig

mwcraig Jun 21, 2019

Could you mention where in the repo an interested reader could find these configuration files if they wanted to see a full example? Note: It may be so obvious as to not need an explanation, didn't have a chance to take a look this morning.

This comment has been minimized.

Copy link
@gpoore

gpoore Jun 22, 2019

Author Contributor

The config files are under languages/ in the repo. If you think that's worth mentioning, I'm happy to do that.

This comment has been minimized.

Copy link
@mwcraig

mwcraig Jun 22, 2019

Nope, that seems easy enough to find.

Notice that `jupyter_kernel` was only needed (and only allowed) for the first
code chunk. The second code chunk is still using the same language
(`.python`), so it shares the same kernel. This Markdown results in
displayed code plus a plot, just as it would within a Jupyter notebook:

This comment has been minimized.

Copy link
@mwcraig

mwcraig Jun 21, 2019

What do you think about changing this example to be .cb.run to demonstrate that .cb.nb is not necessary? You could add a sentence or clause indicating that cb.nb would produce a notebook-like display if that were desired. It just occurred to me that one advantage of codebrain of a traditional notebook is the easy of displaying only the output but you have no example of that.

Alternatively, a sentence or clause indicating that this could be done with .cb.run in both places to display only the output would be helpful.

addressed @mwcraig comments; removed reference to cb.nb example under…
… Jupyter kernels that is now a cb.run example; added missing mention of formats for rich_output under discussion of show
@mwcraig

This comment has been minimized.

Copy link

commented Jun 22, 2019

@deniederhut -- I think this paper is ready to go and should be included in the proceedings. 👏 🚀

@stefanv

This comment has been minimized.

Copy link
Member

commented Jun 23, 2019

@deniederhut I am also happy to sign off on the paper.

Thanks, @gpoore, for being so responsive; I really like where this landed!

@mwcraig

This comment has been minimized.

Copy link

commented Jun 23, 2019

FYI @gpoore I just opened conda-forge/staged-recipes#8615 to add codebraid and bespon to conda-forge. Please comment on that PR whether you are willing to be listed as a maintainer on the conda-forge packages.

@gpoore

This comment has been minimized.

Copy link
Contributor Author

commented Jun 23, 2019

@stefanv Thanks for being part of the review, especially for your very helpful comments on the overall structure of the paper!

@mwcraig Thanks for being part of the review, especially for all the detailed comments about things that could be ambiguous and examples that could be improved!

I'll do a final detailed proofread in the next day or two to make sure I haven't missed any grammar, layout, or typography issues.

@deniederhut

This comment has been minimized.

Copy link
Member

commented Jun 25, 2019

Woohoo! Thanks everyone!

@deniederhut deniederhut added ready and removed pending-comment labels Jun 25, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.