Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate and Cache new Code Chunks in Documentation Mode #19

Open
brandonwillard opened this issue Feb 4, 2015 · 13 comments
Open

Evaluate and Cache new Code Chunks in Documentation Mode #19

brandonwillard opened this issue Feb 4, 2015 · 13 comments

Comments

@brandonwillard
Copy link

If I add a new chunk after the previous chunks are cached, I get the following exception:

Pweave -f texminted -c -d missing_chunk_test.texw
Traceback (most recent call last):
  File "/usr/local/bin/Pweave", line 9, in <module>
    load_entry_point('Pweave==0.23', 'console_scripts', 'Pweave')()
  File "/usr/local/lib/python2.7/dist-packages/Pweave-0.23-py2.7.egg/pweave/scripts.py", line 53, in weave
    pweave.weave(infile, **opts_dict)
  File "/usr/local/lib/python2.7/dist-packages/Pweave-0.23-py2.7.egg/pweave/__init__.py", line 69, in weave
    doc.weave(shell)
  File "/usr/local/lib/python2.7/dist-packages/Pweave-0.23-py2.7.egg/pweave/pweb.py", line 141, in weave
    self.run(shell)
  File "/usr/local/lib/python2.7/dist-packages/Pweave-0.23-py2.7.egg/pweave/pweb.py", line 109, in run
    runner.run()
  File "/usr/local/lib/python2.7/dist-packages/Pweave-0.23-py2.7.egg/pweave/processors.py", line 53, in run
    success = self._getoldresults()
  File "/usr/local/lib/python2.7/dist-packages/Pweave-0.23-py2.7.egg/pweave/processors.py", line 260, in _getoldresults
    executed.append(self._oldresults[i].copy())
IndexError: list index out of range
Makefile:14: recipe for target 'missing_chunk_test.tex' failed
make: *** [missing_chunk_test.tex] Error 1

I was assuming that the caching mechanism would notice the missing chunk, evaluate and cache it, then proceed. Is that the intended functionality?

@sgi3
Copy link

sgi3 commented Mar 9, 2015

I have the same problem, please fix this. The way it works now is that i have to cache all chunks again with Pweave -f texminted -c %.texw when adding a new chunk

@mpastell
Copy link
Owner

I don't have time to work on this at the moment. I agree that the implementation is not ideal, you're welcome to submit a pull request if you have a suggestion on how to fix it.

Note that Pweave only caches input and output text and not Python objects, so if new chunks need the data from old ones there is no easy fix to this problem.

@brandonwillard
Copy link
Author

Gotcha. I’ve been making some small changes toward those ends,
so—hopefully—I’ll have a pull request for you.

On Tue, Mar 31, 2015 at 3:34 PM, Matti Pastell notifications@github.com
wrote:

I don't have time to work on this at the moment. I agree that the
implementation is not ideal, you're welcome to submit a pull request if you
have a suggestion on how to fix it.

Note that Pweave only caches input and output text and not Python objects,
so if new chunks need the data from old ones there is no easy fix to this
problem.


Reply to this email directly or view it on GitHub
#19 (comment).

@brandonwillard
Copy link
Author

Seems like one could simply bypass caching in documentation mode and use the caching magic in an IPython processor. A subclass of PwebIPythonProcessor that loads the extension and adds the magic before the self.IPy.run_* statements might do the trick.

@scfrank
Copy link

scfrank commented Aug 24, 2017

Has there been any activity on this?

I'd really appreciate chunk-level caching functionality, which seems like it would be closely related.
My use case: I have an increasingly long document with more and more pweave-generated figures, where I'd like to only have to recompile the one I'm currently working on.

Thanks for creating pweave! It's encouraged me to plot more graphs, which is always good :-)

@brandonwillard
Copy link
Author

I've been slowly taking a shot at improved caching (see here), but progress has been slow due to multiple competing interests. Namely, a desire to

  • fold inline chunks into the general chunk framework,
  • provide multi-line chunk options,
  • provide generalized caching
    • e.g. naive output-only caching that considers changes in buffer content/source and chunk settings,
  • make everything work almost entirely within the Jupyter ecosystem
    • every chunk evaluation engine is necessarily a Jupyter kernel
    • use of nbformat as the underlying parsed document format,
  • and provide precision Python-only caching
    • bytecode-aware caching, via the mechanics behind the with hack given here.

@mpastell
Copy link
Owner

@brandonwillard Those are multiple big changes that you are talking about. Please don't submit them as one pull request, but split it into separate ones.

Note:

  • Every chunk evaluation engine is already a Jupyter kernel
  • I don't see the benefit of using nbformat as the parsed document format, you can already use it for output.

I suggest you first do:

  • fold inline chunks into the general chunk framework
  • provide generalized caching e.g. naive output-only caching that considers changes in buffer content/source and chunk settings,

I have decided not to allow multi-line chunk options as it breaks editor support and I haven't seen a compelling need for it. If you can up with a proper implementation with tests I can accept it, but put it as separate pull request.

@brandonwillard
Copy link
Author

Oh, sorry, I hadn't done that work with a PR in mind; it was just a test branch that started with caching and turned into all sorts of stuff. If there's an interest in those latter two goals, I can separate them and make PRs. As for the nbformat idea, I can start an issue discussing my reasons.

@fgregg
Copy link

fgregg commented Apr 13, 2018

@brandonwillard how were you thinking of implementing save_chunk_state master...brandonwillard:caching-changes#diff-2747ccbd23b5ea3c1c42eb01071e5a6eR166

@brandonwillard
Copy link
Author

Ah, yeah, I left off with the idea of incrementally pickling the session in _[save|load]_chunk_state. This idea isn't all that efficient/feasible without, perhaps, an incremental approach.

At around the same time, I was experimenting with a more granular, variable-level caching that uses code/ASTs extracted from with bodies and had intended to port this idea instead of using (incremental) session caching.

Regardless, I've gone full org-mode nowadays, so I don't know when I'll get time to jump back into this!

@fgregg
Copy link

fgregg commented Apr 13, 2018

Thanks @brandonwillard.

@fgregg
Copy link

fgregg commented Apr 13, 2018

@brandonwillard, both of the approaches you considered seem particular to python. Currently, it looks like Pweave is trying to not be tied to Python by using Jupyter to allow different kernels. Do you know if Jupyter kernel managers have a language-independent means to serialize the state of a kernel?

Stack Overflow seems to suggest no

@brandonwillard
Copy link
Author

Yeah, I think that any non-naive caching (e.g. more than just caching output and validating against source text differences) is necessarily language-specific.

However, it seems like more than a few popular languages have straight-forward runtime bytecode tools, AST generation and — at the very least — introspection capabilities. As with Python, it's possible to implement a less naive caching with those.

Regarding Jupyter, it would be fantastic to see an abstraction of bytecode and/or AST objects exposed by the client protocol. The project has a somewhat related idea in its instrospection messages. Otherwise, one can always implement smart caching at the kernel level and use custom messages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants