# Using the ctapipe Provenance service

The provenance functionality is used automatically when you use most of ctapipe functionality (particularly `ctapipe.core.Tool` and functions in `ctapipe.io` and `ctapipe.utils`), so normally you don't have to work with it directly. It tracks both input and output files, as well as details of the machine and software environment on which a Tool executed. 

Here we show some very low-level functions of this system:

In [1]:
from ctapipe.core import Provenance
from pprint import pprint

## Activities

The basis of Provenance is an *activity*, which is generally an executable or step in a script. Activities can be nested (e.g. with sub-activities), as shown below, but normally this is not required:

In [2]:
p = Provenance()  # note this is a singleton, so only ever one global provenence object
p.clear()
p.start_activity()
p.add_input_file("test.txt")

p.start_activity("sub")
p.add_input_file("subinput.txt")
p.add_input_file("anothersubinput.txt")
p.add_output_file("suboutput.txt")
p.finish_activity("sub")

p.start_activity("sub2")
p.add_input_file("sub2input.txt")
p.finish_activity("sub2")

p.finish_activity()

In [3]:
p.finished_activity_names

['sub', 'sub2', '/home/travis/virtualenv/python3.7.6/bin/python']

Activities have associated input and output *entities*  (files or other objects)

In [4]:
[ (x['activity_name'], x['input']) for x in p.provenance]

[('sub',
  [{'url': '/home/travis/build/cta-observatory/ctapipe/docs/examples/subinput.txt',
    'role': None},
   {'url': '/home/travis/build/cta-observatory/ctapipe/docs/examples/anothersubinput.txt',
    'role': None}]),
 ('sub2',
  [{'url': '/home/travis/build/cta-observatory/ctapipe/docs/examples/sub2input.txt',
    'role': None}]),
 ('/home/travis/virtualenv/python3.7.6/bin/python',
  [{'url': '/home/travis/build/cta-observatory/ctapipe/docs/examples/test.txt',
    'role': None}])]

Activities track when they were started and finished:

In [5]:
[ (x['activity_name'],x['duration_min']) for x in p.provenance]

[('sub', 0.00013333333333420683),
 ('sub2', 0.0001166666665675109),
 ('/home/travis/virtualenv/python3.7.6/bin/python', 0.004750000000051102)]

## Full provenance

The provence object is a list of activitites, and for each lots of details are collected:

In [6]:
p.provenance[0]

{'activity_name': 'sub',
 'activity_uuid': 'c5ea770f-91bf-4fa8-a68d-f000363d949f',
 'start': {'time_utc': '2020-07-05T11:44:23.860'},
 'stop': {'time_utc': '2020-07-05T11:44:23.868'},
 'system': {'ctapipe_version': '0.8.0.post10+git262b6aa',
  'ctapipe_resources_version': '0.3.0',
  'eventio_version': '1.4.1',
  'ctapipe_svc_path': None,
  'executable': '/home/travis/virtualenv/python3.7.6/bin/python',
  'platform': {'architecture_bits': '64bit',
   'architecture_linkage': '',
   'machine': 'x86_64',
   'processor': 'x86_64',
   'node': 'travis-job-dac37d3b-4b29-49d5-812a-4e9dcbe13e8e',
   'version': '#32-Ubuntu SMP Tue Feb 11 03:55:48 UTC 2020',
   'system': 'Linux',
   'release': '5.0.0-1031-gcp',
   'libcver': ('glibc', '2.2.5'),
   'num_cpus': 2,
   'boot_time': '2020-07-05T11:34:13.000'},
  'python': {'version_string': '3.7.6 (default, Dec 21 2019, 10:36:56) \n[GCC 7.4.0]',
   'version': ('3', '7', '6'),
   'compiler': 'GCC 7.4.0',
   'implementation': 'CPython'},
  'environment':

This can be better represented in JSON:

In [7]:
print(p.as_json(indent=2))

[
  {
    "activity_name": "sub",
    "activity_uuid": "c5ea770f-91bf-4fa8-a68d-f000363d949f",
    "start": {
      "time_utc": "2020-07-05T11:44:23.860"
    },
    "stop": {
      "time_utc": "2020-07-05T11:44:23.868"
    },
    "system": {
      "ctapipe_version": "0.8.0.post10+git262b6aa",
      "ctapipe_resources_version": "0.3.0",
      "eventio_version": "1.4.1",
      "ctapipe_svc_path": null,
      "executable": "/home/travis/virtualenv/python3.7.6/bin/python",
      "platform": {
        "architecture_bits": "64bit",
        "architecture_linkage": "",
        "machine": "x86_64",
        "processor": "x86_64",
        "node": "travis-job-dac37d3b-4b29-49d5-812a-4e9dcbe13e8e",
        "version": "#32-Ubuntu SMP Tue Feb 11 03:55:48 UTC 2020",
        "system": "Linux",
        "release": "5.0.0-1031-gcp",
        "libcver": [
          "glibc",
          "2.2.5"
        ],
        "num_cpus": 2,
        "boot_time": "2020-07-05T11:34:13.000"
      },
      "python": {
        "

## Storing provenance info in output files

* already this can be stored in something like an HDF5 file header, which allows hierarchies.
* Try to flatted the data so it can be stored in a key=value header in a **FITS file** (using the FITS extended keyword convention to allow >8 character keywords), or as a table 

In [8]:
def flatten_dict(y):
    out = {}

    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '.')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '.')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(y)
    return out

In [9]:
d = dict(activity=p.provenance)

In [10]:
pprint(flatten_dict(d))

{'activity.0.activity_name': 'sub',
 'activity.0.activity_uuid': 'c5ea770f-91bf-4fa8-a68d-f000363d949f',
 'activity.0.duration_min': 0.00013333333333420683,
 'activity.0.input.0.role': None,
 'activity.0.input.0.url': '/home/travis/build/cta-observatory/ctapipe/docs/examples/subinput.txt',
 'activity.0.input.1.role': None,
 'activity.0.input.1.url': '/home/travis/build/cta-observatory/ctapipe/docs/examples/anothersubinput.txt',
 'activity.0.output.0.role': None,
 'activity.0.output.0.url': '/home/travis/build/cta-observatory/ctapipe/docs/examples/suboutput.txt',
 'activity.0.start.time_utc': '2020-07-05T11:44:23.860',
 'activity.0.status': 'sub',
 'activity.0.stop.time_utc': '2020-07-05T11:44:23.868',
 'activity.0.system.arguments.0': '/home/travis/virtualenv/python3.7.6/lib/python3.7/site-packages/ipykernel_launcher.py',
 'activity.0.system.arguments.1': '-f',
 'activity.0.system.arguments.2': '/tmp/tmp4vbfry4w.json',
 'activity.0.system.arguments.3': '--HistoryManager.hist_file=:memo