# SUMMARY notebook

This notebook scans the directory in which it lives to find all jupyter notebooks (other than itself) in that directory. It then prints for every notebook it finds (1) a hyperlink to the notebook, and (2) the first cell (which is always markdown) of the notebook. This way you can read a nice, automatically generated summary of all the notebooks without having to open all of them. If you find a notebook that you want to explore further, you can simply click on its link to open it.

In [1]:
# Version: 2 (ignore case)
import os
import json
from IPython.display import display, Markdown

# the name of this file
this_fname = 'SUMMARY.ipynb'
fname_to_md = {}
for fname in sorted([x for x in os.listdir('./')],
                   key=lambda s: s.casefold()):
    if fname[-6:] == '.ipynb'  and fname != this_fname:
        # print('------------', fname)
        with open(fname, 'r', encoding="utf-8") as f:
            fdata = json.load(f)
            fname_to_md[fname] = ''.join(fdata['cells'][0]['source'])
# print(fname_to_md)
pre_sep = '\n\n<hr style="height:10px; background-color: blue;">\n\n'
full_md = ''
k = 1
num_nb = len(fname_to_md)
project_name ="SentenceAx"
who ="rrtucci"
where = "jupyter_notebooks"
for fname, md in fname_to_md.items():
    sep = pre_sep
    local_link = f' [<a href="{fname}" target= "_blank">local link</a>] '
    github_link = f' [<a href="https://github.com/{who}/{project_name}/blob/master/{where}/' +\
        f'{fname}">github link</a>] '
    sep += fname + local_link + github_link + str(k) + '/' + str(num_nb) + '\n\n'
    full_md += sep + md
    k += 1
display(Markdown(full_md))



<hr style="height:10px; background-color: blue;">

cc-train_test(pid=5).ipynb [<a href="cc-train_test(pid=5).ipynb" target= "_blank">local link</a>]  [<a href="https://github.com/rrtucci/SentenceAx/blob/master/jupyter_notebooks/cc-train_test(pid=5).ipynb">github link</a>] 1/13

# cc-train_test(pid=5)

SentenceAx uses 2 NNs, one for task="ex" and another for task="cc". This notebook trains the fullly-fledged (not a warmup) NN for the task="cc". 

<font color='red'>**NOTE: All checkpoint files (in the `weights/cc_model` directory) ending in ".ckpt" are erased everytime this notebook is run.**</font>

After running this notebook, append the suffix ".best" (or something other than ".ckpt") to the name of the checkpoint (a.k.a. weights) file that this notebook outputs. Otherwise, the checkpoint file will be erased in the next run. Furthermore, notebooks for predicting need a weights file, and they are designed to look for a weights file whose name ends in ".best". If they can't find a weights file ending in ".best", they will abort.

If you are not doing the training yourself, you can find the weights that were generated by this notebook at https://huggingface.co/datasets/rrtucci/SentenceAx

<hr style="height:10px; background-color: blue;">

ex-extract(pid=3).ipynb [<a href="ex-extract(pid=3).ipynb" target= "_blank">local link</a>]  [<a href="https://github.com/rrtucci/SentenceAx/blob/master/jupyter_notebooks/ex-extract(pid=3).ipynb">github link</a>] 2/13

# ex-extract(pid=3)

This notebook performs action="extract". For this action, the computer does no cc splitting, only ex extraction. If you want the computer do both, split and extract, set the action to "splitextract".

The notebook reads the file:

`predicting/small_pred.txt`

with 6 sentences we want to extract from, and it writes the file

`predicting/small_pred_extract_ssents.txt`

with the predictions (i.e., ssents= simple sentences extracted from the original sentences.)

**This notebook requires** that you derive the ex and  cc  weights first by running the notebooks `ex-train_test(pid=1)` 
and `cc-train_test(pid=5)`. 
Alternatively, you can download those weights from: 

https://huggingface.co/datasets/rrtucci/SentenceAx

As explained in those 2 notebooks, the best weights should have the suffix ".best".

<hr style="height:10px; background-color: blue;">

ex-splitextract(pid=6).ipynb [<a href="ex-splitextract(pid=6).ipynb" target= "_blank">local link</a>]  [<a href="https://github.com/rrtucci/SentenceAx/blob/master/jupyter_notebooks/ex-splitextract(pid=6).ipynb">github link</a>] 3/13

# ex-splitextract(pid=6)

This notebook performs action="splitextract". By setting split_only=False, we ask it to do both the cc splitting and the ex extraction.

The notebook reads the file:

`predicting/small_pred.txt`

with 6 sentences we want to splitextract, and it writes the file

`predicting/small_pred_splitextract_ssents.txt`

with the predictions (i.e., ssents= simple sentences extracted from the original sentences.)

**This notebook requires** that you derive the ex and  cc  weights first by running the notebooks `ex-train_test(pid=1)` 
and `cc-train_test(pid=5)`. As explained in those 2 notebooks, the best weights should have the suffix ".best".

<hr style="height:10px; background-color: blue;">

ex-splitextract(pid=6, split_only=True).ipynb [<a href="ex-splitextract(pid=6, split_only=True).ipynb" target= "_blank">local link</a>]  [<a href="https://github.com/rrtucci/SentenceAx/blob/master/jupyter_notebooks/ex-splitextract(pid=6, split_only=True).ipynb">github link</a>] 4/13

# ex-splitextract(pid=6, split_only=True)

This notebook performs action="splitextract". By setting split_only=True, we ask it to do the cc splitting but not the ex extraction.

The notebook reads the file:

`predicting/small_pred.txt`

with 6 sentences we want to split, and it writes the file

`predicting/small_pred_split_ssents.txt`

with the predictions (i.e., ssents= simple sentences extracted from the original sentences.)

**This notebook requires** that you derive the ex and  cc  weights first by running the notebooks `ex-train_test(pid=1)` 
and `cc-train_test(pid=5)`. As explained in those 2 notebooks, the best weights should have the suffix ".best".

<hr style="height:10px; background-color: blue;">

ex-train_test(pid=1).ipynb [<a href="ex-train_test(pid=1).ipynb" target= "_blank">local link</a>]  [<a href="https://github.com/rrtucci/SentenceAx/blob/master/jupyter_notebooks/ex-train_test(pid=1).ipynb">github link</a>] 5/13

# ex-train_test(pid=1)

SentenceAx uses 2 NNs, one for task="ex" and another for task="cc". This notebook trains the fullly-fledged (not a warmup) NN for the task="ex". 

<font color='red'>**NOTE: All checkpoint files (in the `weights/ex_model` directory) ending in ".ckpt" are erased everytime this notebook is run.**</font>

After running this notebook, append the suffix ".best" (or something other than ".ckpt") to the name of the checkpoint (a.k.a. weights) file that this notebook outputs. Otherwise, the checkpoint file will be erased in the next run. Furthermore, notebooks for predicting need a weights file, and they are designed to look for a weights file whose name ends in ".best". If they can't find a weights file ending in ".best", they will abort.

If you are not doing the training yourself, you can find the weights that were generated by this notebook at https://huggingface.co/datasets/rrtucci/SentenceAx

<hr style="height:10px; background-color: blue;">

ex-train_test(pid=1)epoch=20_tune_epoch_acc=0.5189.ipynb [<a href="ex-train_test(pid=1)epoch=20_tune_epoch_acc=0.5189.ipynb" target= "_blank">local link</a>]  [<a href="https://github.com/rrtucci/SentenceAx/blob/master/jupyter_notebooks/ex-train_test(pid=1)epoch=20_tune_epoch_acc=0.5189.ipynb">github link</a>] 6/13

# ex-train_test(pid=1)

SentenceAx uses 2 NNs, one for task="ex" and another for task="cc". This notebook trains the fullly-fledged (not a warmup) NN for the task="ex". 

<font color='red'>**NOTE: All checkpoint files (in the `weights/ex_model` directory) ending in ".ckpt" are erased everytime this notebook is run.**</font>

After running this notebook, append the suffix ".best" (or something other than ".ckpt") to the name of the checkpoint file that this notebook outputs. Otherwise, it will be erased in the next run. Furthermore, notebooks for predicting that need a weights file, won't find one, as they are designed to look for a weights file whose name ends in "best".

If you are not doing the training yourself, you can find the weights that were generated by this notebook at https://huggingface.co/datasets/rrtucci/SentenceAx

<hr style="height:10px; background-color: blue;">

global_variables.ipynb [<a href="global_variables.ipynb" target= "_blank">local link</a>]  [<a href="https://github.com/rrtucci/SentenceAx/blob/master/jupyter_notebooks/global_variables.ipynb">github link</a>] 7/13

# Global Variables

This notebook prints the global variables in the file `sax_globals.py`.

<hr style="height:10px; background-color: blue;">

tensorboard_tips.ipynb [<a href="tensorboard_tips.ipynb" target= "_blank">local link</a>]  [<a href="https://github.com/rrtucci/SentenceAx/blob/master/jupyter_notebooks/tensorboard_tips.ipynb">github link</a>] 8/13

# Tensorboard (TB) tips

make sure you have done 

    pip install tensorboard

To view TB in Jupyter cell at end of run, do this


    %reload_ext tensorboard
    %tensorboard --logdir=logs/ex


To view TB in browser **during** or at end of run, open terminal and do this

    tensorboard --logdir=logs/ex
    
(Replace logs/ex by logs/cc to view cc instead of ex folder. Also, replace 
logs/ex by logs_warmup/ex, and logs/cc by logs_warmup/cc to view the warmup results.)


IMP: There's a bug in Windows (it has been documented on the web) when you try to display TB logs in a jupyter cell. If a folder named ".tensorboard-info" exists, it must be deleted or TB won't open.

Tips:
* There is no "expand all" button for expanding all the plot panes at once. However, if you search for _*, you will get a pane with all the plots in it.

* For every scalar plot (i.e., 2D plot, with time on the X axis), you see a smoothed out curve-fit and a fainter unsmoothed curve fit. You can control the amount of smoothing of the smoothed one via a slider. When the smoothing is zero, the smoothed (unfaint) and unsmoothed (faint) curve-fits  coincide.

* A menu allows one to choose among 3 options for time units along the X axis: 
    1. step number (e.g.,  step 3), 
    2. wall time (e.g.,  3:12 PM) 
    3. relative time (e.g.,  20:34 minutes since launch of experiment).

* The plot with the Y axis labelled "epoch" took me a while to figure out. Assume X= step number, Y=epoch. Then the map X->Y is a **finite set of points** (FSP). Confusingly, instead of plotting that FSP, TB shows only two **continuous curve fits** (CCF) to that FSP, a smooth curve fit and an unsmoothed one (fainter). They could have plotted both the FSP (what is called a scatter plot) and the two CCFs, but instead plotted only the CCFs. When num_steps ~ 10 and num_epochs ~ 2, these two CCFs look weird because the number of points in the FSP is small (just 20). But if num_step ~ 1,000 and num_epochs ~ 5, then the number of points in the FSP grows (5000), and the two CCFs and the FSP start to merge. A fractional number of epochs does have an intuitive meaning: if 10.45 epochs have transpired, and the number of steps per epoch is 1,000, then 10,450 steps have transpired.

<hr style="height:10px; background-color: blue;">

warmup-cc-train_test(pid=5).ipynb [<a href="warmup-cc-train_test(pid=5).ipynb" target= "_blank">local link</a>]  [<a href="https://github.com/rrtucci/SentenceAx/blob/master/jupyter_notebooks/warmup-cc-train_test(pid=5).ipynb">github link</a>] 9/13

# warmup cc-train_test(pid=5)

SentenceAx uses 2 NNs, one for task="ex" and another for task="cc". This warmup notebook trains the NN for the task="cc". 

The warmup NN has small sizes for everything so that it can be trained quickly but not accurately without GPU.

<font color='red'>**NOTE: All checkpoint files (in the `weights_warmup/cc_model` directory) ending in ".ckpt" are erased everytime this notebook is run.**</font>

After running this notebook, append the suffix ".best" (or something other than ".ckpt") to the name of the checkpoint (a.k.a. weights) file that this notebook outputs. Otherwise, the checkpoint file will be erased in the next run. Furthermore, notebooks for predicting need a weights file, and they are designed to look for a weights file whose name ends in ".best". If they can't find a weights file ending in ".best", they will abort.

<hr style="height:10px; background-color: blue;">

warmup-ex-extract(pid=3).ipynb [<a href="warmup-ex-extract(pid=3).ipynb" target= "_blank">local link</a>]  [<a href="https://github.com/rrtucci/SentenceAx/blob/master/jupyter_notebooks/warmup-ex-extract(pid=3).ipynb">github link</a>] 10/13

# warmup ex-extract(pid=3)

This warmup notebook performs action="extract". For this action, the computer does no cc splitting, only ex extraction. If you want the computer do both, split and extract, set the action to "splitextract".

The notebook reads the file:

`predicting/small_pred.txt`

with 6 sentences we want to extract from, and it writes the file

`predicting/small_pred_extract_ssents.txt`

with the predictions (i.e., ssents= simple sentences extracted from the original sentences.)

**This notebook requires** that you derive the ex and  cc  weights first by running the notebooks `warmup-ex-train_test(pid=1)` 
and `warmup-cc-train_test(pid=5)`. As explained in those 2 notebooks, the best weights should have the suffix ".best".

<hr style="height:10px; background-color: blue;">

warmup-ex-splitextract(pid=6).ipynb [<a href="warmup-ex-splitextract(pid=6).ipynb" target= "_blank">local link</a>]  [<a href="https://github.com/rrtucci/SentenceAx/blob/master/jupyter_notebooks/warmup-ex-splitextract(pid=6).ipynb">github link</a>] 11/13

# warmup ex-splitextract(pid=6)

This warmup notebook performs action="splitextract". By setting split_only=False, we ask it to do both the cc splitting and the ex extraction.

The notebook reads the file:

`predicting/small_pred.txt`

with 6 sentences we want to splitextract, and it writes the file

`predicting/small_pred_splitextract_ssents.txt`

with the predictions (i.e., ssents= simple sentences extracted from the original sentences.)

**This notebook requires** that you derive the ex and  cc  weights first by running the notebooks `warmup-ex-train_test(pid=1)` 
and `warmup-cc-train_test(pid=5)`. As explained in those 2 notebooks, the best weights should have the suffix ".best".

<hr style="height:10px; background-color: blue;">

warmup-ex-splitextract(pid=6, split_only=True).ipynb [<a href="warmup-ex-splitextract(pid=6, split_only=True).ipynb" target= "_blank">local link</a>]  [<a href="https://github.com/rrtucci/SentenceAx/blob/master/jupyter_notebooks/warmup-ex-splitextract(pid=6, split_only=True).ipynb">github link</a>] 12/13

# warmup ex-splitextract(pid=6, split_only=True)

This notebook requires that there be a weights file with the suffix ".best", in the weights_warmup/ex_model folder. See the training notebooks for more info about this ".best" weights file.# warmup ex-splitextract(pid=6, split_only=True)

This warmup notebook performs action="splitextract". By setting split_only=True, we ask it to do the cc splitting but not the ex extraction.

The notebook reads the file:

`predicting/small_pred.txt`

with 6 sentences we want to split, and it writes the file

`predicting/small_pred_split_ssents.txt`

with the predictions (i.e., ssents= simple sentences extracted from the original sentences.)

The warmup NN has small sizes for everything so that it can be trained quickly but not accurately without GPU.

**This notebook requires** that you derive the ex and  cc  weights first by running the notebooks `warmup-ex-train_test(pid=1)` 
and `warmup-cc-train_test(pid=5)`. As explained in those 2 notebooks, the best weights should have the suffix ".best".

<hr style="height:10px; background-color: blue;">

warmup-ex-train_test(pid=1).ipynb [<a href="warmup-ex-train_test(pid=1).ipynb" target= "_blank">local link</a>]  [<a href="https://github.com/rrtucci/SentenceAx/blob/master/jupyter_notebooks/warmup-ex-train_test(pid=1).ipynb">github link</a>] 13/13

# warmup ex-train_test(pid=1)

SentenceAx uses 2 NNs, one for task="ex" and another for task="cc". This warmup notebook trains the NN for the task="ex".

The warmup NN has small sizes for everything so that it can be trained quickly but not accurately without GPU.

<font color='red'>**NOTE: All checkpoint files (in the `weights_warmup/ex_model` directory) ending in ".ckpt" are erased everytime this notebook is run.**</font>

After running this notebook, append the suffix ".best" (or something other than ".ckpt") to the name of the checkpoint (a.k.a. weights) file that this notebook outputs. Otherwise, the checkpoint file will be erased in the next run. Furthermore, notebooks for predicting need a weights file, and they are designed to look for a weights file whose name ends in ".best". If they can't find a weights file ending in ".best", they will abort.