In [None]:
%sx pwd

In [42]:
#get_ipython().__getstate__()

# How to Un-Delete Your Jupyter Notebooks

- Ray (Lora) Johns, NLU Research Engineer & Computational Linguist

## What we'll cover:

1. Structure of a Jupyter Notebook
2. The IPython session
3. Jupyter %%magics
4. Handy notebook tricks
5. Viewing session history with sqlite
6. Version control and backup with jupytext
7. The %store magic for macros and more

## What you'll be able to do:

1. Understand that notebooks are functional wrappers for different language kernels
2. Manipulate Jupyter data structures to recover lost data
3. Use IPython to access code and output and convert it to different formats
4. Do data forensics on your code history with SQL 
5. Create reusable macros to streamline your Jupyter workflow  

# What's a Jupyter notebook, anyway?

A Jupyter notebook is just a  dictionary with a few keys and a [JSON schema](https://github.com/jupyter/nbformat/blob/master/nbformat/v4/nbformat.v4.schema.json).

```{json}
{
  "metadata" : {
    "kernel_info": {
        # if kernel_info is defined, its name field is required.
        "name" : "the name of the kernel"
    },
    "language_info": {
        # if language_info is defined, its name field is required.
        "name" : "the programming language of the kernel",
        "version": "the version of the language",
        "codemirror_mode": "The name of the codemirror mode to use [optional]"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0,
  "cells" : [
      # list of cell dictionaries, see below
  ],
}
```
Primarily, there are __code cells__, __output cells__, and __markdown cells__.

- __code cells__ are where you input code to execute, as well as any comments.

- __output cells__ are where the output of code cells appears. Notebooks are interactive, like the iPython REPL. Each cell gets its own output, but cells share access to varables.

- __markdown cells__ like this one allow you to input formatted text. Jupyter supports displaying many kinds of formatting, including HTML and $\LaTeX$.


# Scenario: Deleting a few cells in an active notebook

When you run a cell or hit “save”, the Jupyter notebook server sends your code as JSON to a notebook on your computer that stores your input and output. You can use this data storage to recover code that you deleted on the surface.

In [2]:
# Run some code to generate some fake data
import numpy as np

In [3]:
# Set the random seed
np.random.seed(6543791)

In [4]:
# Generate 100 samples from the uniform distribution
array = np.random.uniform(size=100)

In [5]:
# Specify a boolean condition
condition = array < 0.5

In [6]:
# Slice the array to return matching data
array_lt_half = array[condition]

In [7]:
# Slice the array to return non-matching data
array_geq_half = array[~condition]

In [8]:
len(array_lt_half)

47

In [9]:
len(array_geq_half)

53

In [10]:
idx = np.random.randint(0, min(len(array_geq_half), len(array_lt_half)))
idx

1

In [11]:
# Sanity check a sample from each subarray
print(array_lt_half[idx])
print(array_geq_half[idx])

0.04866575720571309
0.6327758725506353


## Using `In` and `Out`

- `In` is a list. You can use any Python list operations on it.
- `Out` is dict. You must access it using its keys, which are the cell numbers of the code cells that produced the output. 
- `Out` only stores the results of cells that sent a computation or return value to the REPL. The same code, if you run it twice, will get two dict keys, one for each numbered cell. (You can rerun cells easily with `Shift+Return`.)
- Markdown cells are not counted or recorded.
- `_` stores the most recent previous output.

In [12]:
type(In)

list

In [13]:
type(Out)

dict

In [14]:
_

dict

In [15]:
# Note that each cell is its own item

# Get the last 5 cells run

for cell in In[-5:]:
    print(cell)
    print('\n')

# Sanity check a sample from each subarray
print(array_lt_half[idx])
print(array_geq_half[idx])


type(In)


type(Out)


_


# Note that each cell is its own item

# Get the last 5 cells run

for cell in In[-5:]:
    print(cell)
    print('\n')




In [16]:
import random 
random.seed(73852)

In [17]:
def return_some_data(array1, array2):
    idx = np.random.randint(min(len(array1), len(array2)))
    arr = random.choice([array1, array2])
    return arr[idx]

In [18]:
data = return_some_data(array_geq_half, array_lt_half)
data

0.22994183653328526

In [19]:
# Run this cell several times by hitting `Ctrl+Return`, or rerun it with the %rerun magic

data*100

22.994183653328527

In [20]:
%rerun 81

No lines in history match specification


In [21]:
# Out is a dict that holds the outputs to the REPL.
# Note the repeated values.
Out

{1: ['/home/ray/Development/medium/notebooks'],
 8: 47,
 9: 53,
 10: 1,
 12: list,
 13: dict,
 14: dict,
 18: 0.22994183653328526,
 19: 22.994183653328527}

In [22]:
Out.get(19)

## Using the `%history` magic

- The `%history` line magic prints your input history (last in, first out).
- The similar `%notebook` command does the same thing but pipes the output into a new Jupyter notebook.

In [23]:
%history -l 5 # get the last 5 inputs

data = return_some_data(array_geq_half, array_lt_half)
data
# Run this cell several times by hitting `Ctrl+Return`, or rerun it with the %rerun magic

data*100
%rerun 81
# Out is a dict that holds the outputs to the REPL.
# Note the repeated values.
Out
Out.get(30)


In [24]:
%history -g -f my_history.py # save your entire history to a file




In [93]:
%history -g 'python' # search the history with a glob pattern

92: %history -n -g 'python' # search the history with a glob pattern
  93: %history -g 'python' # search the history with a glob pattern


In [107]:
%history -n 1-5 # lines in the specified range and print the line numbers

1: %sx pwd
   2:
# Run some code to generate some fake data
import numpy as np
   3:
# Set the random seed
np.random.seed(6543791)
   4:
# Generate 100 samples from the uniform distribution
array = np.random.uniform(size=100)
   5:
# Specify a boolean condition
condition = array < 0.5


In [96]:
%history -u # get only unique history from the session

In [99]:
%history -t 1 # get the Python-generated source code

get_ipython().run_line_magic('sx', 'pwd')


In [109]:
%history ~2/1-2 # get history from [SESSION_NO]/[RANGE_NO]

# ~<number> is shorthand for "session from <number> sessions previously"

%sx pwd
# Run some code to generate some fake data
import numpy as np


In [111]:
%history 1 -o # return the output too

%sx pwd
['/home/ray/Development/medium/notebooks']


In [113]:
%history -l 3 # get the last n lines from all sessions, default is 10

%history -o
%history 1 -o
%history -l 15 # get the last n lines from all sessions, default is 10


In [114]:
%history -f saved_history.py # save any slice to a file

# Scenario: Getting back a lot of data

Things to try:

### Check the sqlite database in `/home/.ipython/profile_default`

- Copy the sqlite history to a backup
- Open the backup in a SQL editor to find your session
- Alternatively, use the sqlite CLI
- Query the database and output the desired code into a recovery file with a `.py` extension
- Use jupytext to turn the script back into a notebook
- make backups of your notebooks with jupytext (Bonus: these are way better to version control on git!)

### check `.ipynb-checkpoints`

## Use sqlite on the command line
- use the Jupyter `%%bash` cell magic to write bash commands in the notebook!
- copy the database
- output the query to `.py` and convert to `.ipynb` with jupytext
- use jupytext to make `.py` backups of your notebooks 

In [123]:
%%bash

find  ~/.ipython/profile_default -name 'history.sqlite'

cp 'history.sqlite' 'history-bak.sqlite'

/home/ray/.ipython/profile_default/history.sqlite


In [124]:
%%bash

sqlite3 ~/.ipython/profile_default/history-bak.sqlite \
"select distinct(source) || char(10) from history where session = 1;" > recovered_code.py

In [126]:
%%bash

jupytext --to notebook recovered_code.py

[jupytext] Reading recovered_code.py
[jupytext] Writing recovered_code.ipynb


In [129]:
%%bash

jupytext --to py jupyter_tutorial.ipynb

[jupytext] Reading jupyter_tutorial.ipynb
[jupytext] Writing jupyter_tutorial.py


## Querying multiple tables

This example will return all the distinct code inputs since January 1, 2020 that contain the string `pytest` and save it to a file in the current working directory called `pytest_history.py`.

In [130]:
%%bash

sqlite3 ~/.ipython/profile_default/history-bak.sqlite \
"select line, source \
from history \
join sessions \
on sessions.session = history.session \
where sessions.start > '2020-01-01 00:00:00.000000' \
and history.source like '%pytest%';" \
> pytest_history.py

# Helpful shortcuts, magics, and macros

- Use IPython’s %store magic to store variables, macros, and aliases in the IPython database.

This tells iPython's `%store` magic to save variables, macros, and aliases in the database. You can save anything you want to from a notebook and then reload it later in another one.

It is a good idea to back up your config files before running bash scripts that directly edit them. 

In [None]:
%%bash

if [ ! -e "$HOME/.ipython/profile_default/ipython_config.py") ]; then
    touch "ipython_config.py"
else
    echo "Profile exists";
    echo "c.StoreMagics.autorestore = True" >> "ipython_config.py"
fi

- Save objects with `%store`
- See what's in `%store` and reload objects with `%store -r`

In [134]:
%store

Stored variables and their in-db values:
__add_work_dirs             -> IPython.macro.Macro('import os\nimport sys\n\ndirs
__load_env                  -> IPython.macro.Macro("import os, sys, subprocess\n\
__load_nlm                  -> IPython.macro.Macro("get_ipython().run_line_magic(


In [136]:
%store -r __load_env

## Define a macro

The syntax to create a macro is 
```{bash}
%macro <name-of-macro> <range(s) of cells to add to macro>
```

Make one out of your SQL query that turns the last session's history into a new file, save it with `%store`, then check that it's in the database and inspect its contents

In [138]:
%macro __recover_code 124 126

Macro `__recover_code` created. To execute, type its name (without quotes).
=== Macro contents: ===
get_ipython().run_cell_magic('bash', '', '\nsqlite3 ~/.ipython/profile_default/history-bak.sqlite \\\n"select source || char(10) from history where session = 1;" > recovered_code.py\n')
get_ipython().run_cell_magic('bash', '', '\njupytext --to notebook recovered_code.py\n')


In [139]:
%store __recover_code

Stored '__recover_code' (Macro)


In [140]:
%store

Stored variables and their in-db values:
__add_work_dirs             -> IPython.macro.Macro('import os\nimport sys\n\ndirs
__load_env                  -> IPython.macro.Macro("import os, sys, subprocess\n\
__load_nlm                  -> IPython.macro.Macro("get_ipython().run_line_magic(
__recover_code              -> IPython.macro.Macro('get_ipython().run_cell_magic(


In [141]:
__recover_code?

[0;31mType:[0m           Macro
[0;31mString form:[0m   
get_ipython().run_cell_magic('bash', '', '\nsqlite3 ~/.ipython/profile_default/history-bak.sqlite <...> .py\n')
           get_ipython().run_cell_magic('bash', '', '\njupytext --to notebook recovered_code.py\n')
           
[0;31mFile:[0m           ~/.pyenv/versions/medium/lib/python3.7/site-packages/IPython/core/macro.py
[0;31mDocstring:[0m     
Simple class to store the value of macros as strings.

Macro is just a callable that executes a string of IPython
input when called.
[0;31mInit docstring:[0m store the macro value, as a single string which can be executed


In [None]:
%%bash

for file in $(find . -f -name "*.ipynb"); do 
    jupytext --to py file;
done