## fact.path Examples

# path deconstruction

Sometimes one wants to iterate over a bunch of file paths and get the (night, run) integer tuple from the path. Often in order to retrieve information for each file from the RunInfo DB. 

Often the paths we get from something like:

    paths = glob('/fact/raw/*/*/*/*')
    
Below I have defined a couple of example paths, which I want to deconstruct.
Note that not all of the `paths_for_parsing` contain the typical "yyyy/mm/dd" part.
Still the `night` and `run` are found just fine. 

In [1]:
from fact.path import parse
help(parse)

Help on function parse in module fact.path:

parse(path)
    Return a dict with {prefix, suffix, night, run} parsed from path.
    
    path: string
        any (absolute) path should be fine.



In [2]:
paths_for_parsing = [
     '/fact/raw/2016/01/01/20160101_011.fits.fz',
     '/fact/aux/2016/01/01/20160101.FSC_CONTROL_TEMPERATURE.fits',
     '/fact/aux/2016/01/01/20160101.log',
     '/home/guest/tbretz/gainanalysis.20130725/files/fit_bt2b/20140115_079_079.root'
]

In [3]:
for path in paths_for_parsing:
    print(path)
    print(parse(path))
    print()

/fact/raw/2016/01/01/20160101_011.fits.fz
{'prefix': '/fact/raw', 'night': 20160101, 'run': 11, 'suffix': '.fits.fz'}

/fact/aux/2016/01/01/20160101.FSC_CONTROL_TEMPERATURE.fits
{'prefix': '/fact/aux', 'night': 20160101, 'run': None, 'suffix': '.FSC_CONTROL_TEMPERATURE.fits'}

/fact/aux/2016/01/01/20160101.log
{'prefix': '/fact/aux', 'night': 20160101, 'run': None, 'suffix': '.log'}

/home/guest/tbretz/gainanalysis.20130725/files/fit_bt2b/20140115_079_079.root
{'prefix': '/home/guest/tbretz/gainanalysis.20130725/files/fit_bt2b', 'night': 20140115, 'run': 79, 'suffix': '_079.root'}



In [4]:
%timeit parse(paths_for_parsing[0])

The slowest run took 4.25 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 3.08 µs per loop


Parsing is quicker than 10µs, but at the moment we have in the order of 250k runs, so parsing all paths in the raw folder might take as long as 2.5 seconds.

However, usually `glob` is taking much longer to actually get all the paths in the first place, so speed should not be an issue.

----


# Path construction

Equally often, people already have runs from the RunInfo DB, and want to find the according files. Be it raw files or aux-files or other files, that happen to sit in a similar tree-like directory structure, like for example the photon-stream files.

the typical task starts with the (night, run) tuple and wants to create a path like
"/gpfs0/fact/processing/photon-stream/yyyy/mm/dd/night_run.phs.jsonl.gz"

Or similar.

In [5]:
from fact.path import tree_path
help(tree_path)

Help on function tree_path in module fact.path:

tree_path(night, run, prefix, suffix)
    Make a tree_path from a (night, run) for given prefix, suffix
    
    night: int or string
        eg. 20160101 or '20160101'
    run: int or string
        eg. 11 or '011' or None (int, string or None accepted)
    prefix: string
        eg. '/fact/raw' or '/fact/aux'
    suffix: string
        eg. '.fits.fz' or '.log' or '.AUX_FOO.fits'



In [6]:
from functools import partial

night_run_tuples = [
    (20160101, 1),
    (20160101, 2),
    (20130506, 3),
]

In [7]:
photon_stream_path = partial(tree_path,
    prefix='/gpfs0/fact/processing/photon-stream',
    suffix='.phs.jsonl.gz'
)
for night, run in night_run_tuples:
    print(photon_stream_path(night, run))

/gpfs0/fact/processing/photon-stream/2016/01/01/20160101_001.phs.jsonl.gz
/gpfs0/fact/processing/photon-stream/2016/01/01/20160101_002.phs.jsonl.gz
/gpfs0/fact/processing/photon-stream/2013/05/06/20130506_003.phs.jsonl.gz


In [8]:
aux_path = partial(
    tree_path,
   prefix='/fact/aux',
    suffix='.FSC_CONTROL_TEMPERATURE.fits',
    run=None
)
for night, run in night_run_tuples:
    print(aux_path(night))

/fact/aux/2016/01/01/20160101.FSC_CONTROL_TEMPERATURE.fits
/fact/aux/2016/01/01/20160101.FSC_CONTROL_TEMPERATURE.fits
/fact/aux/2013/05/06/20130506.FSC_CONTROL_TEMPERATURE.fits



But what about more special cases? I sometime copy files from ISDC or La Palma to my machine in order to work with them locally and try something out. In the past I often did not bother to recreate the yyyy/mm/dd file structure, since I copied the files e.g. like this:

    scp isdc:/fact/aux/*/*/*/*.FSC_CONTROL_TEMPERATURE.fits ~/fact/aux_toy/.
    
In this case I cannot make use of the `TreePath` thing, so I have to roll my own solution again?

Nope! We have you covered. Assume you have a quite sepcialized path format like e.g. this:

    '/home/guest/tbretz/gainanalysis.20130725/files/fit_bt2b/20140115_079_079.root'

 * yyyy/mm/dd tree structure missing, and 
 * file name contains **two** not one run id.
 
Just define a template for this filename, e.g. like this:

In [9]:
from fact.path import template_to_path
help(template_to_path)

Help on function template_to_path in module fact.path:

template_to_path(night, run, template, **kwargs)
    Make path from template and (night, run) using kwargs existing.
    
    night: int or string
        e.g. night = 20160102 (int)
        is used to create Y,M,D,N template values as:
        Y = "2016"
        M = "01"
        D = "02"
        N = "20160101"
    run: int or string
        e.g. run = 1  or run = "000000001"
        is used to create template value R = "001"
    template: string
        e.g. "/foo/bar/{Y}/baz/{R}_{M}_{D}.gz.{N}"
    kwargs:
        if template contains other place holders than Y,M,D,N,R
        kwargs are used to format these.



In [10]:
single_pe_path = partial(
    template_to_path,
    template='/home/guest/tbretz/gainanalysis.20130725/files/fit_bt2b/{N}_{R}_{R}.root'
)

for night, run in night_run_tuples:
    print(single_pe_path(night, run))

/home/guest/tbretz/gainanalysis.20130725/files/fit_bt2b/20160101_001_001.root
/home/guest/tbretz/gainanalysis.20130725/files/fit_bt2b/20160101_002_002.root
/home/guest/tbretz/gainanalysis.20130725/files/fit_bt2b/20130506_003_003.root


Okay but what if the 2nd run id is not always the same as the first?

In that case you'll have to type a bit more:

In [11]:
single_pe_path_2runs = partial(
    template_to_path,
    template='/home/guest/tbretz/gainanalysis.20130725/files/fit_bt2b/{N}_{R}_{run2:03d}.root'
)

for night, run in night_run_tuples:
    print(single_pe_path_2runs(night, run, run2=run+2))

/home/guest/tbretz/gainanalysis.20130725/files/fit_bt2b/20160101_001_003.root
/home/guest/tbretz/gainanalysis.20130725/files/fit_bt2b/20160101_002_004.root
/home/guest/tbretz/gainanalysis.20130725/files/fit_bt2b/20130506_003_005.root
