# Events to DataFrame
ObsPlus provides a way to extract useful summary information from ObsPy objects in order to create dataframes. This transformation is lossy but very useful when the full complexity of the `Catalog` object isn't warranted. By default the `events_to_df` function collects information the authors of ObsPlus have found useful, but it is fully extensible/customizable using the functionality of the [DataFrameExtractor](../utils/dataframeextractor.ipynb).

To demonstrate how the `Catalog` can be flattened into a table, let's again use the Crandall catalog.

In [None]:
import obspy
import numpy as np
from matplotlib import pyplot as plt
from obspy.clients.fdsn import Client

import obsplus

In [None]:
crandall = obsplus.load_dataset('crandall_test')
cat = crandall.event_client.get_events()
ev_df = obsplus.events_to_df(cat)

ev_df.head()

`events_to_df` can also operate on other `event_client`s, like the `EventBank`.

In [None]:
bank = crandall.event_client
ev_df2 = obsplus.events_to_df(bank)
ev_df2.head()

Now we have access to all the wonderful Pandas magics. Here are a few contrived examples of things we may want to do:

In [None]:
# plot a histogram of magnitudes
ev_df.magnitude.hist()
plt.show()

Since there aren't a lot of events let's look at the picks to make things slightly more interesting:

In [None]:
# get pick info into a dataframe
picks = obsplus.picks_to_df(cat)

In [None]:
# count the types of phase picks made on all events
picks.phase_hint.value_counts()

In [None]:
# calculate the max pick_time for each event
picks.groupby('event_id')['time'].max()

We could also calculate travel time stats grouped by stations on stations with at least 3 P picks.
Since all the events are coming from roughly the same location (within a few km) we might look for stations that have high standard deviations or obvious outliers as on the the first steps in quality control.

In [None]:
# get only P picks
df = picks[picks.phase_hint.str.upper() == 'P']

# add columns for travel time
df['travel_time'] = df['time'] - df['event_time']

# filter out stations that aren't used at least 3 times
station_count = df['station'].value_counts()
stations_with_three = station_count[station_count > 2]

# only include picks that are used on at least 3 stations
df = df[df.station.isin(stations_with_three.index)]

# get stats of travel times
df.groupby('station')['travel_time'].describe()

In addition to `events_to_df` and `picks_to_df`, the following extractors are defined:

- `arrivals_to_df` extracts arrival information from an Origin object (or from the preferred origin of each event in a catalog)
- `amplitudes_to_df` extracts amplitude information
- `station_magnitudes_to_df` extracts station magnitude information from a catalog, event, or magnitude
- `magnitudes_to_df` extracts magnitude information