# Analysis filters and extractors



In [1]:
import openpathsampling as paths
from openpathsampling.experimental.storage import Storage, monkey_patch_all
paths = monkey_patch_all(paths)

from tqdm.auto import tqdm

In [2]:
from openpathsampling.analysis.filters import *

In [3]:
storage = Storage("./oneway.db", mode='r')

## Condition filters

Condition filters are associated with a condition that they test. We'll start by looking at the `RejectedSteps` condition filter, which (surprise!) checks whether a step was rejected.

In [4]:
rejected_steps.condition(storage.steps[0])

False

The initial condition is always accepted.

We can also use `RejectedSteps` as a filter on steps.

In [5]:
for step in rejected_steps(storage.steps):
    ...  # here is where you might do something to analyze rejection reasons

In [6]:
# we can use some basic Python to get a quick count of accepted steps
len(list(accepted_steps(storage.steps)))

559

This needs to be wrapped in a list because internally we're using generators. This means that we don't create the full list of objects, which can be useful if the object takes significant space in memory.

Next we'll wrap the generator in `tqdm`, which will give us a progress bar to show how much has been processed. Obviously, making a Python list out of existing objects and getting its length is much faster than loading our shooting points from storage!

In [7]:
len(list(tqdm(shooting_steps(storage.steps))))

0it [00:00, ?it/s]

1000

`RejectedSteps`, `AcceptedSteps`, and `ShootingSteps` are filters on steps; we also have condition filters on samples:

In [8]:
sample = storage.steps[0].active[0]

In [9]:
replica(1).condition(sample)

False

In [10]:
replica(0).condition(sample)

True

## Combining condition filters

We can use logical combinations of condition filters to make much more complicated filters:

In [11]:
custom_filter = accepted_steps | rejected_steps
len(list(custom_filter(storage.steps)))

1001

In [12]:
custom_filter = rejected_steps & canonical_mover('ForwardShootMover')
len(list(custom_filter(storage.steps)))

153

`CanonicalMover` is another step-wise condition filter, but you can define the mover with either an instance of a `PathMover`, a subclass of `PathMover`, or the string name of a `PathMover` subclass. If you provide a specific instance, it will filter for only that instance. If you give a class or the name of a class, it will filter for all movers in that class.

These combinations can be arbitrarily complex (but remember boolean order of operations -- `&` has higher precedence than `|` -- or use parentheses!):

In [13]:
custom_filter = (
    rejected_steps & canonical_mover('ForwardShootMover')
    | accepted_steps & canonical_mover('BackwardShootMover')
)
len(list(custom_filter(storage.steps)))

337

In [14]:
# bad parens here give inherent contradiction
custom_filter = (
    rejected_steps 
    & (canonical_mover('ForwardShootMover') | accepted_steps)
    & canonical_mover('BackwardShootMover')
)
len(list(custom_filter(storage.steps)))

0

So far, if you combine different stages, you'll get an error. We'd love to get the code smart enough to sort that out for you, for now you have to keep track of which filters are for steps and which are for samples. As a hint, a sample can only filter on ensemble, replica, or trajectory.

In [15]:
# rejectedSteps & replica(0)

## Extractors

Extractors extract a specific piece of information from the `MCStep` object. For example, there is the `TrialSamples` extractor, which extracts trials from the canonical move change:

In [16]:
trial_samples(storage.steps[1])

[<Sample @ 0x7f954cc372d0>]

In [17]:
shooting_points(storage.steps[3])

<openpathsampling.engines.toy.snapshot.ToySnapshot at 0x7f954cc6ae90>

## Extractor Filters

A very common practice in analysis would involve looping over steps (filtered by a step filter), extracting samples (e.g., trial samples or active samples) from that step, and then looping over those samples:

In [18]:
replica_0_filter = replica(0)
for step in rejected_steps(storage.steps):
    samples = trial_samples(step)
    for sample in replica_0_filter(samples):
        ... # do something with each sample

Since this is really common, we have created extractor-filters that do that for you. You can create an extractor-filter from an extractor with its `using` method. For the secondary filter (which is a sample filter here) you can chain on the `.with_filter` method.

In [19]:
ext_filt = trial_samples.using(rejected_steps).with_filter(replica(0))
for sample in ext_filt(storage.steps):
    ... # so easy!

Trial samples is an example of an extractor that returns multiple items -- for some moves, like replica exchange, there are multiple trials. By default, the extractor-filter will "flatten" this, so that you see a single stream of all the extracted results. If you don't want this; if you instead want each step to represent a list, you can pass `flatten=False` to the extractor-filter:

In [20]:
steps = list(storage.steps)[:10]
list(ext_filt(steps))

[<Sample @ 0x7f954cc36f10>, <Sample @ 0x7f954d67c150>]

In [21]:
list(ext_filt(steps, flatten=False))

[[<Sample @ 0x7f954cc36f10>], [<Sample @ 0x7f954d67c150>]]

### Extractor Filter Syntax

```python
ef = Extractor.using(StepFilter).with_filter(secondary_filter)
for obj in ef(steps):
    ...
    
# or
for obj in Extractor.using(StepFilter, steps=steps):
    ...
```

`StepFilter` is optional there; if not used, the default filter is `AllSteps`. `with_filter` is only valid if `Extractor` returns lists. The first notation is preferable for more complicated extractor-filters; the second is preferable for simple ones. As an example of a simple one:

```python
for obj in ShootingPoints.using(steps=steps):
    # gives None if no shooting point; use ShootingSteps to filter that out
    ...
```

## Predefined objects

### Extractors

* `ShootingPoints`
* `ActiveSamples`
* `TrialSamples`
* `ActiveEnsembles`
* canonical mover?
* `CanonicalDetails(detail_name)`

### Step Filters

* `AllSteps`
* `AcceptedSteps`
* `RejectedSteps`
* `CanonicalMover(mover)`
* `ShootingSteps`
* `TrialEnsemble(ensemble)`
* `TrialReplica(replica)`

### Secondary filters: Samples

* `Replica(replica)`
* `Ensemble(ensemble)`

## Custom Objects

### Extractors

For most cases, where you extract a single item (i.e., where you won't need a secondary filter), you can make your extractor as an instance of `Extractor`. For example, information about timing is stored in the root change of the move. You write a function to extract that:

In [22]:
def get_timing(step):
    try:
        return step.change.details.timing
    except AttributeError:
        return NOT_EXTRACTED

Note that this checks for an attribute error. That's because the initial conditions step doesn't have this information. With an `Extractor`, if the information you're seeking doesn't exist, you should return the constant object `filters.NOT_EXTRACTED`. If `flatten` is `True`, the extractor-filter will ignore this value. If `flatten` is false, the value for this step will be `filters.NOT_EXTRACTED`.

In [23]:
timing = Extractor(get_timing, name="Timing")

In [24]:
timing_ef = timing.using(all_steps)
all_timings = list(timing_ef(storage.steps))

### Step Filters

Custom step filters are easy to implement using the `StepFilter` class. Just create a function that you can pass to the `condition`, and optionally a name.

For example, let's say we wanted to select steps where the CV named `'x'`, when applied to the shooting point, was less than 0.0.

In [25]:
cv = storage.cvs['x']

def condition(step):
    sp = shooting_points(step)
    if sp is NOT_EXTRACTED:
        return False
    return cv(sp) < 0.0

In [26]:
custom_filter = StepFilter(condition, 'my condition')

In [27]:
len(list(custom_filter(storage.steps)))

423

## Use scenario

For some reason replica 0 is having a lot of rejected backward shots. I want to look at those shooting point snapshots.

The old way: (see http://openpathsampling.org/latest/topics/data_objects.html)

```python
for step in steps:
    canonical = step.change.canonical
    if step.change.accepted:
        continue
        
    if not isinstance(canonical.mover, paths.BackwardShootMover):
        continue

    try:
        rep0 = paths.SampleSet(canonical.trials)[0]
    except KeyError:
        continue
        
    shooting_snap = canonical.details.shooting_snapshot
    ... # do stuff with shooting_snap
```

The new way:

```python
extractor = ShootingPoints.using(RejectedSteps
                                 & CanonicalMover('BackwardShootMover')
                                 & TrialReplica(0))
for shooting_snap in extractor(steps):
    ...  # do stuff with shooting_snap
```

With these, you no longer need to worry about the hierarchical structure that OPS uses to store data -- you create a description of what you want to extract, and you get it!

Other ideas (to be listed in documentation):

```python
unique_samples = TrialSamples.using(AcceptedSteps)
for sample in unique_samples(storage.steps):
    ...
```


```python
one_way = CanonicalMover('ForwardShootMover') | CanonicalMover('BackwardShootMover')
extract = ActiveSamples.using(AcceptedSteps & one_way).with_filter(Ensemble(ens1) | Ensemble(ens2))
for sample in extract(storage.steps):
    ...
```


```python
# track which ensemble a specific replica was associated with at each step
follow_replica = ActiveSamples.using(AllSteps).with_filter(Replica(0))
trace = [sample.ensemble for sample in follow_replica(storage.steps)]
```