# Reading Notebooks

* https://nbformat.readthedocs.io/en/latest/api.html

## Read a .ipynb file

A notebook consists of metadata, format info, and a list of cells. Very simple.

In [119]:
import nbformat
from nbformat.v4.nbbase import new_markdown_cell, new_notebook, new_code_cell

# read notebook file
filename = "02.01-Tagging.ipynb"
with open(filename, "r") as fp:
    content = nbformat.read(fp, as_version=4)

# display file metadata
print(f"nbformat = {content.nbformat}.{content.nbformat_minor}")
display(content.metadata)

nbformat = 4.4


{'celltoolbar': 'Tags',
 'kernelspec': {'display_name': 'Python 3 (ipykernel)',
  'language': 'python',
  'name': 'python3'},
 'language_info': {'codemirror_mode': {'name': 'ipython', 'version': 3},
  'file_extension': '.py',
  'mimetype': 'text/x-python',
  'name': 'python',
  'nbconvert_exporter': 'python',
  'pygments_lexer': 'ipython3',
  'version': '3.9.7'}}

## Loop over cells

The heavy lifting is in the list of cells. 

Here's how to loop over cells. The cell metadata is editable in JupyterLab, and has a 'tags' key where you manage a list of your own tags. If you wanted to remove cells, this would be the place to tag them.

In [111]:
for n, cell in enumerate(content.cells):
    print(f"\nCell {n}")
    print("    metadata:", cell.metadata)
    print("   cell_type:", cell.cell_type)
    print("      keys():", cell.keys())


Cell 0
    metadata: {}
   cell_type: markdown
      keys(): dict_keys(['cell_type', 'metadata', 'source'])

Cell 1
    metadata: {}
   cell_type: markdown
      keys(): dict_keys(['cell_type', 'metadata', 'source'])

Cell 2
    metadata: {'tags': ['differential-equations', 'SIR-model', 'compartmental-model']}
   cell_type: markdown
      keys(): dict_keys(['cell_type', 'metadata', 'source'])

Cell 3
    metadata: {'tags': ['scipy.integrate.solve_ivp', 'differential-equations']}
   cell_type: code
      keys(): dict_keys(['cell_type', 'execution_count', 'metadata', 'outputs', 'source'])

Cell 4
    metadata: {}
   cell_type: markdown
      keys(): dict_keys(['cell_type', 'metadata', 'source'])

Cell 5
    metadata: {'tags': ['home-activity', 'differential-equations']}
   cell_type: markdown
      keys(): dict_keys(['cell_type', 'metadata', 'source'])

Cell 6
    metadata: {'tags': ['class-activity']}
   cell_type: markdown
      keys(): dict_keys(['cell_type', 'metadata', 'source'])



## Remove Code Elements

Remove code segments with specific tags. This uses regular expressions to identify code segments in code cells. This is actually a bit more general and would allow substitution as well.

In [112]:
import re

SOLUTION_CODE = "### BEGIN SOLUTION(.*)### END SOLUTION"
HIDDEN_TESTS = "### BEGIN HIDDEN TESTS(.*)### END HIDDEN TESTS"

def replace_code(pattern, replacement):
    regex = re.compile(pattern, re.DOTALL)
    for cell in content.cells:
        if cell.cell_type == "code" and regex.findall(cell.source):
            cell.source = regex.sub(replacement, cell.source)
            print(f" - {pattern} removed")
                  
replace_code(SOLUTION_CODE, "")
replace_code(HIDDEN_TESTS, "")

 - ### BEGIN SOLUTION(.*)### END SOLUTION removed
 - ### BEGIN HIDDEN TESTS(.*)### END HIDDEN TESTS removed


## Remove cells with a specified tag

Note the use of a generator. This keeps things fast, but does need an explicit `list` if you need a list of tagged cells.

In [113]:
# a example of an iterator that returns all cells satisfying certain conditions.   
def get_cells(tag):
    for cell in content.cells:
        if cell.cell_type == "markdown":
            if 'tags' in cell.metadata.keys():
                if tag in cell.metadata["tags"]:
                    yield cell

tagged_cells = list(get_cells('exercise'))
tagged_cells

[{'cell_type': 'markdown',
  'metadata': {'tags': ['exercise']},
  'source': '### Exercise 1.\n\nIn the following cell write a function that returns the square of a number.'}]

In [118]:
# remove all cells with a specified tag
def remove_cells(tag):
    tagged_cells = list(get_cells(tag))
    if tagged_cells:
        print(f" - removing cells with tag {tag}")
        print(cells)
        content.cells = list(filter(lambda cell: cell not in tagged_cells, content.cells))

remove_cells('exercise')

## Write file out

In [117]:
with open("out.ipynb", "w") as fp:
    nbformat.write(content, fp)