# Analysis 

THis was created by installing the development version of incense git+https://github.com/JarnoRFB/incense.git at 4236128faae866b6bb7a5a32f0914812895b024e

In [None]:
%load_ext autoreload
%autoreload 2

In [50]:
import sys
sys.path.append('.')
from datetime import datetime
import numpy as np
import matplotlib.pyplot as plt

import incense
from incense import ExperimentLoader
from pathlib import Path
from incense.experiment_loader import FileSystemExperimentLoader


basedir = Path("../logs/")
loader = FileSystemExperimentLoader(basedir)
loader


FileSystemExperimentLoader("../logs")

## Finding experiments

To use `incense` we first have to instantiate an experiment loader that will enable us to query the database for specific runs.

It is easiest to retrieve experiments by their id.

In [69]:
exp = loader.find_by_id(399)
print(dir(exp))
exp

['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattr__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_data', '_load_artifacts', '_load_metrics', 'artifacts', 'from_run_dir', 'id', 'metrics']


FileSystemExperiment(id=399, name=100a_50d_l0)

It is also possible to find a set of experiments based on their configuration values. Multiple experiments are returned as a `QuerySet` that just acts as a list, but exposes some custom methods.

In [70]:
exp.status

'COMPLETED'

In [71]:
exp.start_time

'2020-12-01T21:35:09.750739'

In [72]:
print(exp.result)

pmap({'value': 0.71, 'py/object': 'numpy.float64', 'dtype': 'float64'})


In [73]:
print(exp.captured_out)

INFO - 100a_50d_l0 - Running command 'kernelsvc'
INFO - 100a_50d_l0 - Started run with ID "399"
0    S↓NP S↓VP S↑ROOT NP↑S VP↓VBD VP↓SBAR VP↑S VBD↑...
1    S↓VP S↓. S↓_SP S↑ROOT VP↓VBP VP↓NP VP↓SBAR VP↑...
2    S↓S S↓CC S↓S S↑ROOT S↓NP S↓VP S↑S NP↓NP NP↓PP ...
3    SBARQ↓`` SBARQ↓MD SBARQ↓NP SBARQ↓VP SBARQ↓NFP ...
4    S↓NP S↓VP S↓. S↓_SP S↑ROOT NP↑S VP↓VBP VP↓NP V...
Name: path_pos_bigrams, dtype: object
(4999, 6)
(3499, 3265)
INFO - kernelsvc - kernelsvc__min_df_0.1_path_pos_bigrams
INFO - kernelsvc - acc=0.71, f1=0.7085348035879926
INFO - kernelsvc - {'max_iter': 1000, 'verbose': 0, 'kernel': 'poly'}
INFO - 100a_50d_l0 - Result: 0.71
INFO - 100a_50d_l0 - Completed after 0:03:08



In [74]:
exp.config

pmap({'dataset': pmap({'feature_column': 'path_pos_bigrams', 'name': '100 author 50 docs each pos_tags', 'min_doc_freq': 0.1, 'filename': './data/100A50D__doc+pos.pkl'}), 'seed': 854724710, 'name': 'kernelsvc__min_df_0.1_path_pos_bigrams'})

This works down to deeper levels of the data model.

In [85]:
exp.config.name

'kernelsvc__min_df_0.1_path_pos_bigrams'

Alternatively, the classic dictionary access notation can still be used. This is useful, if the the keys of the data model are not valid python identifiers.

## Artifacts

`.artifacts` is a dict that maps from artifact names to artifact objects. The artifacts can rendered according to their type by calling `.render()` on them. They can be saved locally by calling `.save()` on them. The artifact dict might be empty if the run was just restarted and did not yet finish an epoch.

In [86]:
exp.artifacts

{}

PNG artifacts will be shown as images by default.

In [None]:
exp.artifacts['confusion_matrix'].render()

In [None]:
exp.artifacts['confusion_matrix'].save()

While CSV artifacts will be converted into `pandas.DataFrames`.

In [None]:
exp.artifacts['predictions'].render().head()

In [None]:
exp.artifacts['predictions'].render().head()

MP4 artifacts will be downloaded and embedded as an HTML element in the notebook. This can be useful for visualizing dynamics over time.

In [None]:
exp.artifacts['accuracy_movie'].render()

Finally pickle artifacts will the restored to the Python object they originally represented. However, since `pickle` does not have a proper detectable content-type they will be only recognized as `Artifacts` without any more specific type. We can use the `as_type` method to interpret an artifact as an artifact of a more specific or just different type. In our example we just saved the data frame we already have as CSV as a pickle file as well.

In [None]:
pickle_artifact = exp.artifacts['predictions_df'].as_type(incense.artifact.PickleArtifact)
pickle_artifact.render().head()

## Metrics

If a path points to a value that is non-scalar, e.g. a metric, you can pass a dict of the path mapping to a function that reduces the the values to a single value.

In [87]:
exp.config

pmap({'dataset': pmap({'feature_column': 'path_pos_bigrams', 'name': '100 author 50 docs each pos_tags', 'min_doc_freq': 0.1, 'filename': './data/100A50D__doc+pos.pkl'}), 'seed': 854724710, 'name': 'kernelsvc__min_df_0.1_path_pos_bigrams'})

In [88]:
exp.metrics['test.accuracy']

step
0    0.71
Name: test.accuracy, dtype: float64