# Catalog Validation

[Obspy's event representation](https://docs.obspy.org/packages/obspy.core.html#event-metadata) is based on the [FDSN](http://www.fdsn.org/) [QuakeML standard](https://quake.ethz.ch/quakeml/), which is very comprehensive, and arguably the best standard available. However, It can be a bit difficult to work with the `Catalog` object (and friends) for a few reasons:

    1. Often the desired data is deeply nested and hard to aggregate
    
    2. Identifying data relations depends on the complex behavior of Obspy's `ResourceIdentifier`
    
    3. Preferred objects are often not set
    
ObsPlus tries to solve all of these problems. The first is addressed by the [DataFrame Extractor](../utils/dataframeextractor.ipynb). The second and third are addressed by a collection of catalog validators that attempt to ensure all resource_ids point to the correct objects, preferred objects are set, as well as preform other sanity checks. The default event validation function in ObsPlus is a bit opinionated and was built specifically for the NIOSH flavor of seismic event, but you may still find it useful. Additionally, you can create your own validation namespace and define validators for your own data/schema as described by the [validators documentation](validators.ipynb).

## Catalog setup
Let's create a catalog that has the following problems:

- resource_id on arrivals no longer point to the correct picks (only possible to break on Obspy versions <= 1.1.0)

- no preferred origin/magnitudes are set

ObsPlus will go through and set the resource_ids to point to the correct objects, and set all the preferred_{whatever} to the last element in the {whatever}s list (for whatever in ['magnitude', 'origin', 'focal_mechanism']).

In [None]:
import obspy
import obspy.core.event as ev

import obsplus

# create catalog 1
def create_cat1():
    """ a catalog with an arrival that doesn't refer to any pick """
    time = obspy.UTCDateTime('2017-09-22T08:35:00')
    wid = ev.WaveformStreamID(network_code='UU', station_code='TMU', 
                              location_code='', channel_code='HHZ')
    pick = ev.Pick(time=time, phase_hint='P', waveform_id=wid)
    arrival = ev.Arrival(pick_id=pick.resource_id, waveform_id=wid)
    origin = ev.Origin(time=time, arrivals=[arrival], latitude=45.5,
                       longitude=-111.1)
    description = ev.EventDescription(create_cat1.__doc__)
    event = ev.Event(origins=[origin], picks=[pick], 
                     event_descriptions=[description])
    cat = ev.Catalog(events=[event])
    # create a copy of the catalog. In older versions this would screw up
    # the resource ids, but the issue seems to be fixed now.
    cat.copy()
    return cat


cat = create_cat1() 
event = cat[0]

In [None]:
arrival = event.origins[-1].arrivals[-1]
pick = event.picks[-1]

## Validate
We can fix these two problems in place with the validate_catalog function

In [None]:
obsplus.validate_catalog(cat)

In [None]:
print(event.preferred_origin())
arrival = event.origins[0].arrivals[0]
# now we will get the correct pick through the arrival object, even on older versions of obspy
print(arrival.pick_id.get_referred_object())

## Fail fast
For issues that obsplus doesn't know how to fix, an `AssertionError` will be raised. If you are generating or downloading catalogs it may be useful to run them through the validation function right away so that you know there is an issue before trying to perform any meaningful analysis.

For example, if we had an arrival that didn't refer to any known pick this could be a quality issue that you might like to know about.

In [None]:
# create a problem with the catalog
old_pick_id = cat[0].origins[0].arrivals[0].pick_id
cat[0].origins[0].arrivals[0].pick_id = None

try:
    obsplus.validate_catalog(cat)
except AssertionError as e:
    print('something is wrong with this catalog')

# undo the problem
cat[0].origins[0].arrivals[0].pick_id = old_pick_id

## Adding custom validators
See the [section on validators](validators.ipynb) to learn how to create your own validators. If you simply want to use a subset of ObsPlus' validators that can be do like so: 

In [None]:
# import the validators that are desired
import obspy.core.event as ev
from obsplus.validate import validator, validate
from obsplus.events.validate import (
    attach_all_resource_ids, 
    check_arrivals_pick_id,
    check_duplicate_picks,
)
# create new validator namespace
namespace = '_new_test'
validator(namespace, ev.Event)(attach_all_resource_ids)
validator(namespace, ev.Event)(check_arrivals_pick_id)
validator(namespace, ev.Event)(check_duplicate_picks)

# run the new validator
validate(cat, namespace)
