# HTML of a notebook back to .ipynb file

Based on https://stackoverflow.com/a/47138762/8508004

-----

<div class="alert alert-block alert-warning">
<p>If you haven't used one of these notebooks before, they're basically web pages in which you can write, edit, and run live code. They're meant to encourage experimentation, so don't feel nervous. Just try running a few cells and see what happens!.</p>

<p>
    Some tips:
    <ul>
        <li>Code cells have boxes around them.</li>
        <li>To run a code cell, click on the cell and either click the <i class="fa-play fa"></i> button on the toolbar above, or then hit <b>Shift+Enter</b>. The <b>Shift+Enter</b> combo will also move you to the next cell, so it's a quick way to work through the notebook. Selecting from the menu above the toolbar, <b>Cell</b> > <b>Run All</b> is a shortcut to trigger attempting to run all the cells in the notebook.</li>
        <li>While a cell is running a <b>*</b> appears in the square brackets next to the cell. Once the cell has finished running the asterisk will be replaced with a number.</li>
        <li>In most cases you'll want to start from the top of notebook and work your way down running each cell in turn. Later cells might depend on the results of earlier ones.</li>
        <li>To edit a code cell, just click on it and type stuff. Remember to run the cell once you've finished editing.</li>
    </ul>
</p>
</div>

----

## Preparation

Note in addition to the necessary package needing to be installed, the other difference from [the original code](https://stackoverflow.com/a/47138762/8508004) is that the link used in the example as a source of HTML is now longer valid. I'm going to substitute another one of Jake VanderPlas's notebooks. Specifically, instead of the link `http://nbviewer.jupyter.org/url/jakevdp.github.com/downloads/notebooks/XKCD_plots.ipynb`, I'm using `https://nbviewer.org/github/jakevdp/2014_fall_ASTR599/blob/master/notebooks/03_IPython_intro.ipynb`. 

The code cell below will take that script with the change and save it here in the working directory. Run this cell:

In [1]:
t = '''from bs4 import BeautifulSoup
import json
import urllib.request
url = 'https://nbviewer.org/github/jakevdp/2014_fall_ASTR599/blob/master/notebooks/03_IPython_intro.ipynb'
response = urllib.request.urlopen(url)
#  for local html file
# response = open("/Users/note/jupyter/notebook.html")
text = response.read()

soup = BeautifulSoup(text, 'lxml')
# see some of the html
print(soup.div)
dictionary = {'nbformat': 4, 'nbformat_minor': 1, 'cells': [], 'metadata': {}}
for d in soup.findAll("div"):
    if 'class' in d.attrs.keys():
        for clas in d.attrs["class"]:
            if clas in ["text_cell_render", "input_area"]:
                # code cell
                if clas == "input_area":
                    cell = {}
                    cell['metadata'] = {}
                    cell['outputs'] = []
                    cell['source'] = [d.get_text()]
                    cell['execution_count'] = None
                    cell['cell_type'] = 'code'
                    dictionary['cells'].append(cell)

                else:
                    cell = {}
                    cell['metadata'] = {}

                    cell['source'] = [d.decode_contents()]
                    cell['cell_type'] = 'markdown'
                    dictionary['cells'].append(cell)
open('notebook.ipynb', 'w').write(json.dumps(dictionary))
'''
%store t >back_to_the_ipynb_script.py

Writing 't' (str) to file 'back_to_the_ipynb_script.py'.


You should now see `back_to_the_ipynb_script.py` listed in the file browser on the left.

In [2]:
%pip install beautifulsoup4
%pip install lxml

Note: you may need to restart the kernel to use updated packages.
Collecting lxml
  Downloading lxml-4.9.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (6.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.4/6.4 MB[0m [31m15.6 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hInstalling collected packages: lxml
Successfully installed lxml-4.9.1
Note: you may need to restart the kernel to use updated packages.


(When I installed lxml separate from `beautifulsoup4` during a development round, I found I needed to restart the kernel before it got used. Your experience may vary. In fact, when I ran this demonstration notebook fresh and installed both in one cell, it worked without restarting the kernl. However, doing a restart if of the kernel and trying again is something to keep in mind if you get a note about `lxml` when you try to run the code cell below.)

## Basic Usage Example from the source

The difference in notebook used is noted under the preparation section.

In [3]:
%run back_to_the_ipynb_script.py

<div class="container">
<div class="navbar-header">
<button class="navbar-toggle collapsed" data-target=".navbar-collapse" data-toggle="collapse" type="button">
<span class="sr-only">Toggle navigation</span>
<i class="fa fa-bars"></i>
</button>
<a class="navbar-brand" href="/">
<img src="/static/img/nav_logo.svg" width="159"/>
</a>
</div>
<div class="collapse navbar-collapse">
<ul class="nav navbar-nav navbar-right">
<li>
<a class="active" href="http://jupyter.org">JUPYTER</a>
</li>
<li>
<a href="/faq" title="FAQ">
<span>FAQ</span>
</a>
</li>
<li>
<a href="slides/github/jakevdp/2014_fall_ASTR599/blob/master/notebooks/03_IPython_intro.ipynb" title="View as Slides">
<span class="fa fa-gift fa-2x menu-icon"></span>
<span class="menu-text">View as Slides</span>
</a>
</li>
<li>
<a href="script/github/jakevdp/2014_fall_ASTR599/blob/master/notebooks/03_IPython_intro.ipynb" title="View as Code">
<span class="fa fa-code fa-2x menu-icon"></span>
<span class="menu-text">View as Code</span>
</a>
<

Because of the `print(soup.div)` line it prints out some indicator of an early step in the process above. That isn't the result though, and you can feel free to comment out that line to not have that show.

To look at the result, double-click on `notebook.ipynb` in the file listing panel on the left to actually open it as a Jupyter notebook.

You'll see it isn't perfect, for example, in regards to white space in code cells; however, it is pretty good. In fact, if you look close, you can see this whitespace issue in the post about this converter [here](https://stackoverflow.com/a/47138762/8508004). This is that image:

![the example at SO](https://i.stack.imgur.com/1SMvZ.png).

Personally, I would have used [nbformat](https://stackoverflow.com/a/71244733/8508004) to handle the collecting of the cells and generating the notebook. You can find several examples of my use of it [here](https://stackoverflow.com/search?tab=newest&q=user%3a8508004%20nbformat) and [here](https://discourse.jupyter.org/search?q=nbformat%20%40fomightez%20order%3Alatest). However, I don't know if it would improve readbility of the script or ultimate functionality at this point, and so maybe another time.

----

Enjoy!