In [None]:
# default_exp converter

# Converter
> The Jupyter Notebooks need to be turned into either Markdown or HTML documents to be compatible with the Medium API. Quite good tools to do just this already exists such as Jupyter's `nbconvert` or nbdev's `nbdev_nb2md` so we are not going to reinvent the wheel.

In [None]:
# hide
from nbdev.showdoc import *

In [None]:
from nbconvert import MarkdownExporter, FilenameExtension
from nbconvert.writers import FilesWriter

## Jupyter nbconvert

Jupyter's nbconvert is a well established tool created and maintained by the core jupyter developers. There is no need to reinvent the wheel, hence we will be using `nbconvert`'s python API to convert Jupyter Notebook to Markdown documents. You can read more about `nbconvert`'s function in the [official documentation](https://nbconvert.readthedocs.io/en/latest/nbconvert_library.html)

We will be using `Exporter`, namely the `MarkdownExporter` which can read a python notebook and extract the main body (text) and resources (images, etc). Let's first see the basics of how it works and then make a thin wrapper function around it.

In [None]:
m = MarkdownExporter()

In [None]:
body, resources = m.from_filename('LEARNING/test-notebook.ipynb')

All notebook exporters return a tuple containing the body and the resources of the document, for instance the matplotlib image from our test notebook was stored as `output_4_1.png`

In [None]:
resources['outputs']['output_4_1.png'];
resources['outputs'].keys()

dict_keys(['output_4_1.png'])

Also it is important to know that so far, the notebook markdown representation only exists as a python object and no files have been written.

In [None]:
def nb2md_draft(notebook:str):
    """
    Paper thin wrapper around nbconvert.MarkdownExporter. This function takes the path to a jupyter
    notebook and passes it to `MarkdownExporter().from_filename` which returns the body and resources
    of the document
    """
    m = MarkdownExporter()
    body, resources = m.from_filename(notebook)
    return body, resources

This is a very basic notebook to markdown converter that is greatly improved and features are added to it with preprocessors further down in this module, hence this `nb2md()` function is not the one that will be exported and is hence marked as draft.

In [None]:
b, r = nb2md_draft('LEARNING/test-notebook.ipynb')

## Writing Notebook to file

We use the `FilesWriter` object to write the resulting markdown file onto our laptop's storage. We can precise the `build_directory` attribute ([see more Writer options](https://nbconvert.readthedocs.io/en/latest/config_options.html#writer-options)) to indicate where we would like to store our Notebook and the auxiliary files (images, etc). The FilesWriter is "aggresive", meaning it will overwrite whatever files exists if there is a directory or filename clash. Lastly, it is also possible to write a custom Writer such as `MediumWriter` that renders the document and then uploads it to Medium but because I am learning I'd rather see every step in the pipeline.

In [None]:
f = FilesWriter(build_directory = 'Rendered/')

Conveniently, the `write()` method of `FilesWriter` returns the output path.

In [None]:
f.write(output = body, 
        resources = resources,
        notebook_name = 'test-notebook')

'Rendered/test-notebook.md'

### Simple writing function

In [None]:
#export
def WriteMarkdown(body, resources, dir_path = None, filename = None):
    """
    body & resources are the output of any Jupyter nbconvert `Exporter`.
    dir_path should be a relative path with respect to the current working directory. 
    If dir_path is not passed, the output document and its auxiliary files will be written
    to the same location than the input jupyter notebook
    filename should be the output document's name
    
    This function returns the location of the newly written file
    """
    return FilesWriter(build_directory = '' if dir_path is None else dir_path) \
    .write(
        output = body,
        resources = resources,
        notebook_name = filename
    )

#### Example 1 - Write to Jupyter's Notebook directory

In [None]:
WriteMarkdown(body, resources, filename = 'test-notebook')

'LEARNING/test-notebook.md'

#### Example 2 - Write to new directory

In [None]:
WriteMarkdown(body, resources, dir_path= 'Docs', filename= 'test-notebook')

'Docs/test-notebook.md'

#### Example 3 - Write to directory with subdirectory

In [None]:
WriteMarkdown(body, resources, dir_path= 'Docs/Attempt1', filename= 'test-notebook')

'Docs/Attempt1/test-notebook.md'

#### Example 4 - Write outside the current working directory

In [None]:
WriteMarkdown(body, resources, dir_path= '../Docs', filename= 'test-notebook')

'../Docs/test-notebook.md'

In [None]:
# hide
!rm -rf Docs/ Rendered/ ../Docs/ LEARNING/test-notebook.md LEARNING/output_4_1.png

## Handling special tags

### Hide tags - Remove cell if cell has no output

We may wish certain markdown or code cells to not be present in the output document. To achieve this we can use `nbconvert`'s [`RegexRemovePreprocessor`](https://nbconvert.readthedocs.io/en/latest/removing_cells.html#removing-cells-using-regular-expressions-on-cell-content). preprocessors such as this one can either be registered to an `Exporter`(see [how](https://nbconvert.readthedocs.io/en/latest/api/exporters.html#nbconvert.exporters.Exporter.register_preprocessor)) or passed as part of a config (see [how](https://nbconvert.readthedocs.io/en/latest/removing_cells.html#removing-pieces-of-cells-using-cell-tags)). 

In [None]:
#export 
from nbconvert.preprocessors import RegexRemovePreprocessor

m = MarkdownExporter()
m.register_preprocessor(RegexRemovePreprocessor(patterns = ['^#\s*hide-cell']), enabled = True);

__Funnily enough__, the `RegexRemovePreprocessor` [only hides cells that have the tag AND that do no produce an output](https://github.com/jupyter/nbconvert/issues/1091). For example:
```python 
#hide-cell
a = 1
```
would be removed, but:
```python 
#hide-cell
a = 1
print(a) # or simply a
````
would _not_ be removed.

### Clear Output - Remove cell's output but keep cell's content

The standard preprocessors aren't really useful for what I want to do. [`RegexRemovePreprocessors`](https://github.com/jupyter/nbconvert/blob/master/nbconvert/preprocessors/regexremove.py) only remove cells if they have no output in addition to matching the pattern(s) specified. The [`ClearOuputPreprocessor`](https://github.com/jupyter/nbconvert/blob/master/nbconvert/preprocessors/clearoutput.py) removes all outputs from a notebook. Hence I am just going to write a custom preprocessor that is able to hide either a cell's source, a cell's output or the whole cell based on pattern matching performed on a cell's source. After some investigation I realised that best way to achieve this was using [cell tags](https://stackoverflow.com/a/48084050/12821043), though I do not like Jupyter's current tag environment. I do not like them because you have to use the GUI entirely to add tags to a cell, navigating to the the top sidebar, then the **View** section and then the **Cell Toolbar** sub-section and finally click on **Tags** to enable this extra chunky section added to all your cells, even those you may not want to add tag onto. *Hence* I've gone for an implementation that allows for both the use of tags and the of use of text/regex based tagging in the custom preprocessor `HidePreprocessor` written below.

In [None]:
#export 
from nbconvert.preprocessors import Preprocessor, TagRemovePreprocessor
from traitlets import List, Unicode, Set
import re

class HidePreprocessor(Preprocessor):
    """
    Preprocessor that hides cell's body and only keeps the output based on regex matching
    
    Regex matching is based on the [RegexRemovePreprocessor source]
    (https://github.com/jupyter/nbconvert/blob/master/nbconvert/preprocessors/regexremove.py)

    """
    
    
    
    mode = Unicode().tag(config = True) # , 'output', 'all'
    patterns = List(Unicode(), default_value=[]).tag(config=True)
    remove_metadata_fields = Set(
        {'collapsed', 'scrolled'}
    ).tag(config=True)

    def check_conditions(self, cell):
        """
        Checks that a cell matches the pattern.
        Returns: Boolean.
        True means cell should *not* be removed.
        """

        # Compile all the patterns into one: each pattern is first wrapped
        # by a non-capturing group to ensure the correct order of precedence
        # and the patterns are joined with a logical or
        pattern = re.compile('|'.join('(?:%s)' % pattern
                             for pattern in self.patterns))

        # Filter out cells that meet the pattern and have no outputs
        return pattern.match(cell.source)     
    
    def preprocess_cell(self, cell, resources, cell_index):
        """
        Preprocessing to apply to each cell.
        """
        # Skip preprocessing if the list of patterns is empty
        if not self.patterns:
            return cell, resources
        
        if self.mode == 'source': 
            cell, resources = self.hide_source(cell, resources)
        elif self.mode == 'output': 
            cell, resources = self.hide_output(cell, resources)
        elif self.mode == 'cell' or self.mode == 'all':
            cell, resources = self.hide_cell(cell, resources)
        
        return cell, resources
    
    def hide_source(self, cell, resources):
        
        if self.check_conditions(cell):
            cell.metadata.tags = ['hide-source']
            
        return cell, resources
        
    def hide_output(self, cell, resources):
        
        if cell.cell_type == 'code' and self.check_conditions(cell):
            cell.metadata.tags = ['hide-output']
                    
        return cell, resources
    
    def hide_cell(self, cell, resources):
        
        if self.check_conditions(cell):
            cell.metadata.tags = ['hide-cell']
            
        return cell, resources
    

`nbconvert` uses [`traitlets`](https://github.com/ipython/traitlets) where I would normally expect an `__init__()` method. Luckily it is quite intuitive to work with traitlets but I do not grasp the pros and cons of using it.

In [None]:
m = MarkdownExporter()
m.register_preprocessor(HidePreprocessor(mode = 'source', patterns = ['^#\s*hide-source']), enabled = True)
m.register_preprocessor(HidePreprocessor(mode = 'output', patterns = ['^#\s*hide-output']), enabled = True)
m.register_preprocessor(HidePreprocessor(mode = 'cell', patterns = ['^#\s*hide-cell']), enabled = True)
m.register_preprocessor(
    TagRemovePreprocessor(
        remove_input_tags = ('hide-source',),
        remove_all_outputs_tags = ('hide-output',),
        remove_cell_tags = ('hide-cell',),
        enabled = True)
)

<nbconvert.preprocessors.tagremove.TagRemovePreprocessor at 0x7ff9a58ba9a0>

The file `test-hiding.ipynb` contains 4 cells printing the string 'My name is Jack'. The first one has no tags added. The second one has the `#hide-source` tag which results in only the output string being present in the Markdown document. The third cell has the `#hide-output` tag added to it which results in only the cell source ("the code") being present in the Markdown document. The last cell has the `#hide-cell` tag which removes the whole cell (source and output) altogether.

In [None]:
b, r = m.from_filename('LEARNING/test-hiding.ipynb')

In [None]:
print(b)


## Hiding cells              


```python
print('My name is Jack')
```

    My name is Jack


Same as above but hiding source (aka input), output and all (aka whole cell)

### The source of the next cell is hidden

    My name is Jack


### The output of the next cell is hidden


```python
#hide-output
print('My name is Jack')
```

### The entire next cell is hidden



**Above** has been the exploration of how to implement hiding cells sources, cells outputs and entire cells based on text based tags. These will be added in the main `nb2md()` function at the end of this module

### Gister tags

I like syntax highlighting in Medium articles and this is only available (to my knowledge) via GitHub Gists. We will be making our own [preprocessor](https://nbconvert.readthedocs.io/en/latest/nbconvert_library.html#Using-different-preprocessors) to uploads the source code of cells that start with the special tag `# gist`. We will be using the python module [`ghapi`](https://ghapi.fast.ai/) to upload gists.

#### Uploading gists via `ghapi`

In [None]:
import ghapi
from ghapi.auth import GhDeviceAuth, github_auth_device
from ghapi.all import GhApi

In [None]:
ghapi.__version__

'0.1.16'

In [None]:
import os
def check_gh_auth():
    ghauth = GhDeviceAuth()
    if os.getenv('GITHUB_TOKEN') or ghauth.auth() is not None:
        return True
    else:
        return False #User has not authenticated to GH github_auth_device()

In [None]:
api = GhApi(owner='lc5415')

In [None]:
api.gists.create(description='ghapi-test.md', files= 'CONTRIBUTING.md', public = False)

HTTP404NotFoundError: HTTP Error 404: Not Found

In [None]:
#export
import keyring
from ghapi.all import GhApi

class GisterProcessor(Preprocessor):
    """
    Preprocessor that detects the presence of the #gister tag and upload gist to user's github 

    """
    
    mode = Unicode().tag(config = True) # , 'output', 'all'
    patterns = List(Unicode(), default_value=[]).tag(config=True)
    remove_metadata_fields = Set(
        {'collapsed', 'scrolled'}
    ).tag(config=True)

    def check_conditions(self, cell):
        """
        Checks that a cell matches the pattern.
        Returns: Boolean.
        True means cell should *not* be removed.
        """

        # Compile all the patterns into one: each pattern is first wrapped
        # by a non-capturing group to ensure the correct order of precedence
        # and the patterns are joined with a logical or
        pattern = re.compile('|'.join('(?:%s)' % pattern
                             for pattern in self.patterns))

        # Filter out cells that meet the pattern and have no outputs
        return pattern.match(cell.source)     
    
    def preprocess_cell(self, cell, resources, cell_index):
        """
        Preprocessing to apply to each cell.
        """
        # Skip preprocessing if the list of patterns is empty
        if not self.patterns:
            return cell, resources
        
        if self.mode == 'source': 
            cell, resources = self.hide_source(cell, resources)
        elif self.mode == 'output': 
            cell, resources = self.hide_output(cell, resources)
        elif self.mode == 'cell' or self.mode == 'all':
            cell, resources = self.hide_cell(cell, resources)
        
        return cell, resources
    
    def hide_source(self, cell, resources):
        
        if self.check_conditions(cell):
            cell.metadata.tags = ['hide-source']
            
        return cell, resources
        
    def hide_output(self, cell, resources):
        
        if cell.cell_type == 'code' and self.check_conditions(cell):
            cell.metadata.tags = ['hide-output']
                    
        return cell, resources
    
    def hide_cell(self, cell, resources):
        
        if self.check_conditions(cell):
            cell.metadata.tags = ['hide-cell']
            
        return cell, resources

### Image (no tags needed)

## Wrapping preprocessors and the Exporter

In [None]:
#export
def register_preprocessors(markdown_exporter,
                           preprocessors = [
                               hide_cell_with_no_output,
                               
                           ]):
    """
    Iterates through the `register_preprocessor` given a bunch of preprocessors
    """
    for p in preprocessors:
        m.register_preprocessor(p, enabled = True)
    return m