# ParaFrame Demo

This Jupyter notebook provides a simple demo to use hallmark ParaFrame.

Hallmark ParaFrame is a Pandas DataFrame with monkey patch that aims to handle parameter surveys with many output files.  The user provide a python format string to `hallmark.ParaFrame`, then hallmark would parse the directory and file names to construct a ParaFrame with the proper parameters.

## Create Sample Data Files

We start by creating a data directory with files.  The files are structured with python format string:

    f"data/a_{a:d}/b_{b:d}.txt"

In [1]:
%%sh

for a in {0..9}; do
  mkdir -p "data/a_$a"
  for b in {10..19}; do
    touch "data/a_$a/b_$b.txt"
  done
done

ls data/*

data/a_0:
b_10.txt
b_11.txt
b_12.txt
b_13.txt
b_14.txt
b_15.txt
b_16.txt
b_17.txt
b_18.txt
b_19.txt

data/a_1:
b_10.txt
b_11.txt
b_12.txt
b_13.txt
b_14.txt
b_15.txt
b_16.txt
b_17.txt
b_18.txt
b_19.txt

data/a_2:
b_10.txt
b_11.txt
b_12.txt
b_13.txt
b_14.txt
b_15.txt
b_16.txt
b_17.txt
b_18.txt
b_19.txt

data/a_3:
b_10.txt
b_11.txt
b_12.txt
b_13.txt
b_14.txt
b_15.txt
b_16.txt
b_17.txt
b_18.txt
b_19.txt

data/a_4:
b_10.txt
b_11.txt
b_12.txt
b_13.txt
b_14.txt
b_15.txt
b_16.txt
b_17.txt
b_18.txt
b_19.txt

data/a_5:
b_10.txt
b_11.txt
b_12.txt
b_13.txt
b_14.txt
b_15.txt
b_16.txt
b_17.txt
b_18.txt
b_19.txt

data/a_6:
b_10.txt
b_11.txt
b_12.txt
b_13.txt
b_14.txt
b_15.txt
b_16.txt
b_17.txt
b_18.txt
b_19.txt

data/a_7:
b_10.txt
b_11.txt
b_12.txt
b_13.txt
b_14.txt
b_15.txt
b_16.txt
b_17.txt
b_18.txt
b_19.txt

data/a_8:
b_10.txt
b_11.txt
b_12.txt
b_13.txt
b_14.txt
b_15.txt
b_16.txt
b_17.txt
b_18.txt
b_19.txt

data/a_9:
b_10.txt
b_11.txt
b_12.txt
b_13.txt
b_14.txt
b_15.txt
b_16.txt
b_17.txt
b_18.txt


## Create a Hallmark ParaFrame from the Files

Next, we simply use Hallmark ParaFrame to create a "database" of these files.  Note that the format string is passed in without the `f` prefix.

In [2]:
from hallmark import ParaFrame

In [3]:
pf = ParaFrame("data/a_{a:d}/b_{b:d}.txt")

In [4]:
pf

Unnamed: 0,path,a,b
0,data/a_0/b_10.txt,0,10
1,data/a_0/b_11.txt,0,11
2,data/a_0/b_12.txt,0,12
3,data/a_0/b_13.txt,0,13
4,data/a_0/b_14.txt,0,14
...,...,...,...
95,data/a_9/b_15.txt,9,15
96,data/a_9/b_16.txt,9,16
97,data/a_9/b_17.txt,9,17
98,data/a_9/b_18.txt,9,18


## ParaFrame Filter

Hallmark ParaFrame is simply a Pandas DataFrame with a filter.  "Calling" the ParaFrame as a function will perform a standard filter action, with keyworded arguments being the "or" selection criteria.  "And" select can be done by chaining the function calls.

In [5]:
# Filter a==0
pf(a=0)

Unnamed: 0,path,a,b
0,data/a_0/b_10.txt,0,10
1,data/a_0/b_11.txt,0,11
2,data/a_0/b_12.txt,0,12
3,data/a_0/b_13.txt,0,13
4,data/a_0/b_14.txt,0,14
5,data/a_0/b_15.txt,0,15
6,data/a_0/b_16.txt,0,16
7,data/a_0/b_17.txt,0,17
8,data/a_0/b_18.txt,0,18
9,data/a_0/b_19.txt,0,19


In [6]:
# Filter a==0 or 1
pf(a=[0,1])

Unnamed: 0,path,a,b
0,data/a_0/b_10.txt,0,10
1,data/a_0/b_11.txt,0,11
2,data/a_0/b_12.txt,0,12
3,data/a_0/b_13.txt,0,13
4,data/a_0/b_14.txt,0,14
5,data/a_0/b_15.txt,0,15
6,data/a_0/b_16.txt,0,16
7,data/a_0/b_17.txt,0,17
8,data/a_0/b_18.txt,0,18
9,data/a_0/b_19.txt,0,19


In [7]:
# Filter a==0 or b==10
pf(a=0, b=10)

Unnamed: 0,path,a,b
0,data/a_0/b_10.txt,0,10
1,data/a_0/b_11.txt,0,11
2,data/a_0/b_12.txt,0,12
3,data/a_0/b_13.txt,0,13
4,data/a_0/b_14.txt,0,14
5,data/a_0/b_15.txt,0,15
6,data/a_0/b_16.txt,0,16
7,data/a_0/b_17.txt,0,17
8,data/a_0/b_18.txt,0,18
9,data/a_0/b_19.txt,0,19


In [8]:
# Filter a==0 and b==10
pf(a=0)(b=10)

Unnamed: 0,path,a,b
0,data/a_0/b_10.txt,0,10


In [9]:
# For more complicated selection criteria, one can always go back to pandas mask
pf[(2 <= pf.a) & (pf.a <= 4)]

Unnamed: 0,path,a,b
20,data/a_2/b_10.txt,2,10
21,data/a_2/b_11.txt,2,11
22,data/a_2/b_12.txt,2,12
23,data/a_2/b_13.txt,2,13
24,data/a_2/b_14.txt,2,14
25,data/a_2/b_15.txt,2,15
26,data/a_2/b_16.txt,2,16
27,data/a_2/b_17.txt,2,17
28,data/a_2/b_18.txt,2,18
29,data/a_2/b_19.txt,2,19


## Using ParaFrame

The filtering mechanism is very handy when one wants to select some files to process

In [10]:
for p in pf(a=0, b=10).path:
    print(f'Doing something with file "{p}"...')

Doing something with file "data/a_0/b_10.txt"...
Doing something with file "data/a_0/b_11.txt"...
Doing something with file "data/a_0/b_12.txt"...
Doing something with file "data/a_0/b_13.txt"...
Doing something with file "data/a_0/b_14.txt"...
Doing something with file "data/a_0/b_15.txt"...
Doing something with file "data/a_0/b_16.txt"...
Doing something with file "data/a_0/b_17.txt"...
Doing something with file "data/a_0/b_18.txt"...
Doing something with file "data/a_0/b_19.txt"...
Doing something with file "data/a_1/b_10.txt"...
Doing something with file "data/a_2/b_10.txt"...
Doing something with file "data/a_3/b_10.txt"...
Doing something with file "data/a_4/b_10.txt"...
Doing something with file "data/a_5/b_10.txt"...
Doing something with file "data/a_6/b_10.txt"...
Doing something with file "data/a_7/b_10.txt"...
Doing something with file "data/a_8/b_10.txt"...
Doing something with file "data/a_9/b_10.txt"...


## Debug

It is sometime difficult to debug the format string used in ParaFrame.  In such case, one can set `debug=True` in `ParaFrame()` to print out debugging messages:

In [11]:
pf = ParaFrame("data/a_{a:d}/b_{b:d}.txt", debug=True)

0 data/a_{a:d}/b_{b:d}.txt () {}
1 data/a_{a:s}/b_{b:d}.txt () {'a': '*'}
2 data/a_{a:s}/b_{b:s}.txt () {'a': '*', 'b': '*'}
Pattern: "data/a_*/b_*.txt"
100 matches, e.g., "data/a_0/b_10.txt"
