(use/api)=

# Python API

This page outlines how to utilise the cache programatically.

The full Jupyter notebook for this page can accessed here; {jupyter-download:notebook}`api`, for you to try yourself!

## Initialisation

In [2]:
from pathlib import Path
import nbformat as nbf
from jupyter_cache import get_cache
from jupyter_cache.base import NbBundleIn
from jupyter_cache.executors import load_executor, list_executors
from jupyter_cache.utils import (
    tabulate_cache_records, 
    tabulate_stage_records
)

First we setup a cache and ensure that it is cleared.

```{important}
Clearing a cache wipes its entire content, including any settings (such as cache limit).
```

In [3]:
cache = get_cache(".jupyter_cache")
cache.clear_cache()
cache

JupyterCacheBase('/Users/cjs14/GitHub/jupyter-cache/docs/using/.jupyter_cache')

In [3]:
print(cache.list_cache_records())
print(cache.list_staged_records())

[]
[]


## Cacheing Notebooks

To directly cache a notebook:

In [4]:
record = cache.cache_notebook_file(
    path=Path("example_nbs", "basic.ipynb")
)
record

NbCacheRecord(pk=1)

This will add a physical copy of the notebook to tha cache (stripped of any text cells) and return the record that has been added to the cache database.

```{important}
The returned record is static, as in it will not update if the database is updated.
```

The record stores metadata for the notebook:

In [5]:
record.to_dict()

{'hashkey': '818f3412b998fcf4fe9ca3cca11a3fc3',
 'description': '',
 'created': datetime.datetime(2020, 3, 13, 1, 6, 49, 157003),
 'uri': 'example_nbs/basic.ipynb',
 'pk': 1,
 'data': {},
 'accessed': datetime.datetime(2020, 3, 13, 1, 6, 49, 157017)}

```{important}
The URI that notebook is read from is stored, but does not have an impact on later comparison of notebooks. They are only compared by their internal content.
```

We can retrive cache records by their Primary Key (pk): 

In [6]:
cache.list_cache_records()

[NbCacheRecord(pk=1)]

In [7]:
cache.get_cache_record(1)

NbCacheRecord(pk=1)

To load the entire notebook that is related to a pk:

In [8]:
nb_bundle = cache.get_cache_bundle(1)
nb_bundle

NbBundleOut(nb=Notebook(cells=1), record=NbCacheRecord(pk=1), artifacts=NbArtifacts(paths=0))

In [9]:
nb_bundle.nb

{'cells': [{'cell_type': 'code',
   'execution_count': 1,
   'metadata': {},
   'outputs': [{'name': 'stdout', 'output_type': 'stream', 'text': '1\n'}],
   'source': 'a=1\nprint(a)'}],
 'metadata': {'kernelspec': {'display_name': 'Python 3',
   'language': 'python',
   'name': 'python3'},
  'language_info': {'codemirror_mode': {'name': 'ipython', 'version': 3},
   'file_extension': '.py',
   'mimetype': 'text/x-python',
   'name': 'python',
   'nbconvert_exporter': 'python',
   'pygments_lexer': 'ipython3',
   'version': '3.6.1'},
  'test_name': 'notebook1'},
 'nbformat': 4,
 'nbformat_minor': 2}

Trying to add a notebook to the cache that matches an existing one will result in a error, since the cache ensures that all notebook hashes are unique:

In [10]:
record = cache.cache_notebook_file(
    path=Path("example_nbs", "basic.ipynb")
)

CachingError: Notebook already exists in cache and overwrite=False.

If we load a notebook external to the cache, then we can try to match it to one
stored inside the cache:

In [11]:
notebook = nbf.read(str(Path("example_nbs", "basic.ipynb")), 4)
notebook

{'cells': [{'cell_type': 'markdown',
   'metadata': {},
   'source': '# a title\n\nsome text\n'},
  {'cell_type': 'code',
   'execution_count': 1,
   'metadata': {},
   'source': 'a=1\nprint(a)',
   'outputs': [{'name': 'stdout', 'output_type': 'stream', 'text': '1\n'}]}],
 'metadata': {'test_name': 'notebook1',
  'kernelspec': {'display_name': 'Python 3',
   'language': 'python',
   'name': 'python3'},
  'language_info': {'codemirror_mode': {'name': 'ipython', 'version': 3},
   'file_extension': '.py',
   'mimetype': 'text/x-python',
   'name': 'python',
   'nbconvert_exporter': 'python',
   'pygments_lexer': 'ipython3',
   'version': '3.6.1'}},
 'nbformat': 4,
 'nbformat_minor': 2}

In [12]:
cache.match_cache_notebook(notebook)

NbCacheRecord(pk=1)

Notebooks are matched by a hash based only on aspects of the notebook that will affect its execution (and hence outputs). So changing text cells will match the cached notebook:

In [13]:
notebook.cells[0].source = "change some text"

In [14]:
cache.match_cache_notebook(notebook)

NbCacheRecord(pk=1)

But changing code cells will result in a different hash, and so will not be matched:

In [15]:
notebook.cells[1].source = "change some source code"

In [16]:
cache.match_cache_notebook(notebook)

KeyError: 'Cache record not found for NB with hashkey: 74933d8a93d1df9caad87b2e6efcdc69'

To understand the difference between an external notebook, and one stored in the cache, we can 'diff' them:

In [17]:
print(cache.diff_nbnode_with_cache(1, notebook, as_str=True))

nbdiff
--- cached pk=1
+++ other: 
[34m## inserted before nb/cells/0:[0m
[32m+  code cell:
[32m+    execution_count: 1
[32m+    source:
[32m+      change some source code
[32m+    outputs:
[32m+      output 0:
[32m+        output_type: stream
[32m+        name: stdout
[32m+        text:
[32m+          1

[0m[34m## deleted nb/cells/0:[0m
[31m-  code cell:
[31m-    execution_count: 1
[31m-    source:
[31m-      a=1
[31m-      print(a)
[31m-    outputs:
[31m-      output 0:
[31m-        output_type: stream
[31m-        name: stdout
[31m-        text:
[31m-          1

[0m


If we cache this altered notebook, note that this will not remove the previously cached notebook:

In [18]:
nb_bundle = NbBundleIn(
    nb=notebook,
    uri=Path("example_nbs", "basic.ipynb"),
    data={"tag": "mytag"}
)
cache.cache_notebook_bundle(nb_bundle)

NbCacheRecord(pk=2)

In [19]:
print(tabulate_cache_records(
    cache.list_cache_records(), path_length=1, hashkeys=True
))

ID  Origin URI    Created           Accessed          Hashkey                           Data
----  ------------  ----------------  ----------------  --------------------------------  ----------------
   2  basic.ipynb   2020-03-13 01:07  2020-03-13 01:07  74933d8a93d1df9caad87b2e6efcdc69  {'tag': 'mytag'}
   1  basic.ipynb   2020-03-13 01:06  2020-03-13 01:06  818f3412b998fcf4fe9ca3cca11a3fc3


Notebooks are retained in the cache, until the cache limit is reached,
at which point the oldest notebooks are removed.

In [20]:
cache.get_cache_limit()

1000

In [21]:
cache.change_cache_limit(100)

## Staging Notebooks for Execution

Notebooks can be staged, by adding the path as a stage record.

```{important}
This does not physically add the notebook to the cache,
merely store its URI, for later use.
```

In [22]:
record = cache.stage_notebook_file(Path("example_nbs", "basic.ipynb"))
record

NbStageRecord(pk=1)

In [23]:
record.to_dict()

{'assets': [],
 'traceback': '',
 'pk': 1,
 'uri': '/Users/cjs14/GitHub/jupyter-cache/docs/using/example_nbs/basic.ipynb',
 'created': datetime.datetime(2020, 3, 13, 1, 7, 5, 767952)}

If the staged notbook relates to one in the cache, we will be able to retrieve the cache record:

In [24]:
cache.get_cache_record_of_staged(1)

NbCacheRecord(pk=1)

In [25]:
print(tabulate_stage_records(
    cache.list_staged_records(), path_length=2, cache=cache
))

ID  URI                      Created             Assets    Cache ID
----  -----------------------  ----------------  --------  ----------
   1  example_nbs/basic.ipynb  2020-03-13 01:07         0           1


If we add a notebook that cannot be found in the cache, it will be listed for execution:

In [26]:
record = cache.stage_notebook_file(Path("example_nbs", "basic_failing.ipynb"))
record

NbStageRecord(pk=2)

In [27]:
cache.get_cache_record_of_staged(2)  # returns None

In [28]:
cache.list_staged_unexecuted()

[NbStageRecord(pk=2)]

In [29]:
print(tabulate_stage_records(
    cache.list_staged_records(), path_length=2, cache=cache
))

ID  URI                              Created             Assets    Cache ID
----  -------------------------------  ----------------  --------  ----------
   2  example_nbs/basic_failing.ipynb  2020-03-13 01:07         0
   1  example_nbs/basic.ipynb          2020-03-13 01:07         0           1


To remove a notebook from the staging area:

In [30]:
cache.discard_staged_notebook(1)

In [31]:
print(tabulate_stage_records(
    cache.list_staged_records(), path_length=2, cache=cache
))

ID  URI                              Created             Assets
----  -------------------------------  ----------------  --------
   2  example_nbs/basic_failing.ipynb  2020-03-13 01:07         0


## Execution

If we have some staged notebooks:

In [4]:
cache.clear_cache()
cache.stage_notebook_file(Path("example_nbs", "basic.ipynb"))
cache.stage_notebook_file(Path("example_nbs", "basic_failing.ipynb"))

NbStageRecord(pk=2)

In [5]:
print(tabulate_stage_records(
    cache.list_staged_records(), path_length=2, cache=cache
))

ID  URI                              Created             Assets
----  -------------------------------  ----------------  --------
   2  example_nbs/basic_failing.ipynb  2020-03-13 01:15         0
   1  example_nbs/basic.ipynb          2020-03-13 01:15         0


Then we can select an executor (specified as entry points) to execute the notebook.

```{note}
To view the executors log, make sure logging is enabled.
```

In [6]:
list_executors()

[EntryPoint.parse('basic = jupyter_cache.executors.basic:JupyterExecutorBasic')]

In [7]:
from logging import basicConfig, INFO
basicConfig(level=INFO)

executor = load_executor("basic", cache=cache)
executor

JupyterExecutorBasic(cache=JupyterCacheBase('/Users/cjs14/GitHub/jupyter-cache/docs/using/.jupyter_cache'))

In [8]:
result = executor.run_and_cache()
result

INFO:jupyter_cache.executors.base:Executing: /Users/cjs14/GitHub/jupyter-cache/docs/using/example_nbs/basic.ipynb
INFO:jupyter_cache.executors.base:Execution Succeeded: /Users/cjs14/GitHub/jupyter-cache/docs/using/example_nbs/basic.ipynb
INFO:jupyter_cache.executors.base:Executing: /Users/cjs14/GitHub/jupyter-cache/docs/using/example_nbs/basic_failing.ipynb
ERROR:jupyter_cache.executors.base:Execution Failed: /Users/cjs14/GitHub/jupyter-cache/docs/using/example_nbs/basic_failing.ipynb


{'succeeded': ['/Users/cjs14/GitHub/jupyter-cache/docs/using/example_nbs/basic.ipynb'],
 'excepted': ['/Users/cjs14/GitHub/jupyter-cache/docs/using/example_nbs/basic_failing.ipynb'],
 'errored': []}

Successfully executed notebooks will be added to the cache, and data about their execution (such as time taken) will be stored in the cache record:

In [9]:
cache.list_cache_records()

[NbCacheRecord(pk=1)]

In [10]:
record = cache.get_cache_record(1)
record.to_dict()

{'hashkey': '818f3412b998fcf4fe9ca3cca11a3fc3',
 'description': '',
 'created': datetime.datetime(2020, 3, 13, 1, 15, 23, 896565),
 'uri': '/Users/cjs14/GitHub/jupyter-cache/docs/using/example_nbs/basic.ipynb',
 'pk': 1,
 'data': {'execution_seconds': 1.4607883090000087},
 'accessed': datetime.datetime(2020, 3, 13, 1, 15, 23, 896578)}

Notebooks which failed to run will **not** be added to the cache,
but details about their execution (including the exception traceback)
will be added to the stage record:

In [11]:
record = cache.get_staged_record(2)
print(record.traceback)

Traceback (most recent call last):
  File "/Users/cjs14/GitHub/jupyter-cache/jupyter_cache/executors/basic.py", line 152, in execute
    executenb(nb_bundle.nb, cwd=tmpdirname)
  File "/anaconda/envs/mistune/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 737, in executenb
    return ep.preprocess(nb, resources, km=km)[0]
  File "/anaconda/envs/mistune/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 405, in preprocess
    nb, resources = super(ExecutePreprocessor, self).preprocess(nb, resources)
  File "/anaconda/envs/mistune/lib/python3.7/site-packages/nbconvert/preprocessors/base.py", line 69, in preprocess
    nb.cells[index], resources = self.preprocess_cell(cell, resources, index)
  File "/anaconda/envs/mistune/lib/python3.7/site-packages/nbconvert/preprocessors/execute.py", line 448, in preprocess_cell
    raise CellExecutionError.from_cell_and_msg(cell, out)
nbconvert.preprocessors.execute.CellExecutionError: An error occurred while ex

We now have two staged records, and one cache record:

In [12]:
print(tabulate_stage_records(
    cache.list_staged_records(), path_length=2, cache=cache
))

ID  URI                              Created             Assets    Cache ID
----  -------------------------------  ----------------  --------  ----------
   2  example_nbs/basic_failing.ipynb  2020-03-13 01:15         0
   1  example_nbs/basic.ipynb          2020-03-13 01:15         0           1


In [14]:
print(tabulate_cache_records(
    cache.list_cache_records(), path_length=1, hashkeys=True
))

ID  Origin URI    Created           Accessed          Hashkey
----  ------------  ----------------  ----------------  --------------------------------
   1  basic.ipynb   2020-03-13 01:15  2020-03-13 01:15  818f3412b998fcf4fe9ca3cca11a3fc3
