# Using the ctapipe Provenance service

The provenance functionality is used automatically when you use most of ctapipe functionality (particularly `ctapipe.core.Tool` and functions in `ctapipe.io` and `ctapipe.utils`), so normally you don't have to work with it directly. It tracks both input and output files, as well as details of the machine and software environment on which a Tool executed. 

Here we show some very low-level functions of this system:

In [1]:
from ctapipe.core import Provenance
from pprint import pprint

## Activities

The basis of Provenance is an *activity*, which is generally an executable or step in a script. Activities can be nested (e.g. with sub-activities), as shown below, but normally this is not required:

In [2]:
p = Provenance()  # note this is a singleton, so only ever one global provenence object
p.clear()
p.start_activity()
p.add_input_file("test.txt")

p.start_activity("sub")
p.add_input_file("subinput.txt")
p.add_input_file("anothersubinput.txt")
p.add_output_file("suboutput.txt")
p.finish_activity("sub")

p.start_activity("sub2")
p.add_input_file("sub2input.txt")
p.finish_activity("sub2")

p.finish_activity()

In [3]:
p.finished_activity_names

['sub', 'sub2', '/opt/hostedtoolcache/Python/3.8.16/x64/bin/python']

Activities have associated input and output *entities*  (files or other objects)

In [4]:
[ (x['activity_name'], x['input']) for x in p.provenance]

[('sub',
  [{'url': '/home/runner/work/ctapipe/ctapipe/docs/examples/subinput.txt',
    'role': None},
   {'url': '/home/runner/work/ctapipe/ctapipe/docs/examples/anothersubinput.txt',
    'role': None}]),
 ('sub2',
  [{'url': '/home/runner/work/ctapipe/ctapipe/docs/examples/sub2input.txt',
    'role': None}]),
 ('/opt/hostedtoolcache/Python/3.8.16/x64/bin/python',
  [{'url': '/home/runner/work/ctapipe/ctapipe/docs/examples/test.txt',
    'role': None}])]

Activities track when they were started and finished:

In [5]:
[ (x['activity_name'],x['duration_min']) for x in p.provenance]

[('sub', 0.0002166666666880701),
 ('sub2', 0.0001833333334744225),
 ('/opt/hostedtoolcache/Python/3.8.16/x64/bin/python', 0.016799999999950188)]

## Full provenance

The provence object is a list of activitites, and for each lots of details are collected:

In [6]:
p.provenance[0]

{'activity_name': 'sub',
 'activity_uuid': 'a99c7491-70c5-477d-9a20-d7035b40c02a',
 'start': {'time_utc': '2023-02-02T15:26:29.051'},
 'stop': {'time_utc': '2023-02-02T15:26:29.064'},
 'system': {'ctapipe_version': '0.16.1.dev637+gc4cb55ed.d20230202',
  'ctapipe_resources_version': 'not installed',
  'eventio_version': '1.11.0',
  'ctapipe_svc_path': None,
  'executable': '/opt/hostedtoolcache/Python/3.8.16/x64/bin/python',
  'platform': {'architecture_bits': '64bit',
   'architecture_linkage': 'ELF',
   'machine': 'x86_64',
   'processor': 'x86_64',
   'node': 'fv-az577-347',
   'version': '#38-Ubuntu SMP Mon Jan 9 12:49:59 UTC 2023',
   'system': 'Linux',
   'release': '5.15.0-1031-azure',
   'libcver': ('glibc', '2.35'),
   'n_cpus': 2,
   'boot_time': '2023-02-02T15:21:17.000'},
  'python': {'version_string': '3.8.16 (default, Jan 11 2023, 00:28:51) \n[GCC 11.3.0]',
   'version': ('3', '8', '16'),
   'compiler': 'GCC 11.3.0',
   'implementation': 'CPython',
   'packages': [{'name':

This can be better represented in JSON:

In [7]:
print(p.as_json(indent=2))

[
  {
    "activity_name": "sub",
    "activity_uuid": "a99c7491-70c5-477d-9a20-d7035b40c02a",
    "start": {
      "time_utc": "2023-02-02T15:26:29.051"
    },
    "stop": {
      "time_utc": "2023-02-02T15:26:29.064"
    },
    "system": {
      "ctapipe_version": "0.16.1.dev637+gc4cb55ed.d20230202",
      "ctapipe_resources_version": "not installed",
      "eventio_version": "1.11.0",
      "ctapipe_svc_path": null,
      "executable": "/opt/hostedtoolcache/Python/3.8.16/x64/bin/python",
      "platform": {
        "architecture_bits": "64bit",
        "architecture_linkage": "ELF",
        "machine": "x86_64",
        "processor": "x86_64",
        "node": "fv-az577-347",
        "version": "#38-Ubuntu SMP Mon Jan 9 12:49:59 UTC 2023",
        "system": "Linux",
        "release": "5.15.0-1031-azure",
        "libcver": [
          "glibc",
          "2.35"
        ],
        "n_cpus": 2,
        "boot_time": "2023-02-02T15:21:17.000"
      },
      "python": {
        "version_str

## Storing provenance info in output files

* already this can be stored in something like an HDF5 file header, which allows hierarchies.
* Try to flatted the data so it can be stored in a key=value header in a **FITS file** (using the FITS extended keyword convention to allow >8 character keywords), or as a table 

In [8]:
def flatten_dict(y):
    out = {}

    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '.')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '.')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(y)
    return out

In [9]:
d = dict(activity=p.provenance)

In [10]:
pprint(flatten_dict(d))

{'activity.0.activity_name': 'sub',
 'activity.0.activity_uuid': 'a99c7491-70c5-477d-9a20-d7035b40c02a',
 'activity.0.duration_min': 0.0002166666666880701,
 'activity.0.input.0.role': None,
 'activity.0.input.0.url': '/home/runner/work/ctapipe/ctapipe/docs/examples/subinput.txt',
 'activity.0.input.1.role': None,
 'activity.0.input.1.url': '/home/runner/work/ctapipe/ctapipe/docs/examples/anothersubinput.txt',
 'activity.0.output.0.role': None,
 'activity.0.output.0.url': '/home/runner/work/ctapipe/ctapipe/docs/examples/suboutput.txt',
 'activity.0.start.time_utc': '2023-02-02T15:26:29.051',
 'activity.0.status': 'sub',
 'activity.0.stop.time_utc': '2023-02-02T15:26:29.064',
 'activity.0.system.arguments.0': '/opt/hostedtoolcache/Python/3.8.16/x64/lib/python3.8/site-packages/ipykernel_launcher.py',
 'activity.0.system.arguments.1': '-f',
 'activity.0.system.arguments.2': '/tmp/tmpz6p39t59.json',
 'activity.0.system.arguments.3': '--HistoryManager.hist_file=:memory:',
 'activity.0.system