# Load & Process Detection Events
Once you've installed range driver, setup your configuration file, and organized your datasets (see the [installation](https://sfu-bigdata.github.io/range-driver/install.html), [configuration](SetupConfiguration.ipynb), and [data](DataSetup.ipynb) tutorials) you can load all this data into a `Detections` object. 

Below, is code you will have seen in the [configuration](SetupConfiguration.ipynb) tutorial. This takes care of loading the configuration into Python.

In [None]:
import range_driver as rd

# Load raw configuration
config = rd.yload(rd.load_file("path/to/example_config.yaml"))

# Prepare (and update) config for range_driver
rd.prepare_config(config)

## Loading Detection Events
Once the configuration is ready to be used, you can use range driver to create a `Detections` object. This object holds all the information related to the detection events, including the metadata and environmental data specified in your configuration file. 

In [None]:
dets = rd.Detections(config=config)

Once this object has been instantiated you can view the raw detections data as a `DataFrame`. 

In [None]:
dets.detection_events_df.head(5)

## Behind the Scenes of the `Detections` Object
The simple loading procedure above will suffice for almost all cases. However, for others it will be important to customize the process. If you're interesting in learning more about what happens behind the scenes when the `Detections` object is created, checkout the remaining sections in this tutorial. Otherwise, feel free to go directly to the [data visualization & EDA](DataVisualizationEDA.ipynb) tutorial.

By default, the instantiation of the detections object performs a number of processing tasks. Whether or not these processing steps are done is determined by the `do_processing` argument (defaults to `True`). 

In [None]:
dets = rd.Detections(config=config, do_processing=False)

But what does this "processing" step actually involve?

### Calculate Detection Rate
The first thing that happens is that detection rates are calculated from the detection data loaded from the configuration file. Detection rates are the portion of expected detections that actually occurred during a particular time period. 

In [None]:
dets.make_detection_rate()  # Calculate detection interval lengths and split out init sequences
                            # Group detections into timestamp bins and analyze the detections on a group-level. 
                            # Append aggregated bin data to the end of the detections DF.

This is calculated over a few steps: 
1. First detections are examined to find whether an "initial sequence" exists. Often, a series of very frequent detections are observed at the very start of a range test. These are part of an "initial sequence" used during testing/setup. If these detections are found, they are split out and put into `dets.df_inits`.   

2. Next, detections are grouped into timestamp bins, as specified in the configuration file.   

3. Using these time-bins a detection rate is calculated. This rate corresponds to the percentage of expected detections which were actually observed.  Expected detections are calculated based on the tag interval defined in the metadata. For example, if the average interval for a transmitter was 2 minutes and the time period was 30 minutes, we would expect 15 detections during that time period.   

4. Finally, this time-binned data & detection rates are appended to the end of the `dets.detection_df` DataFrame. The first index that contains this binned data is tracked in `self.event_bin_split`.


### Creates two views of the data (event vs bins)
Next, two separate views of the data are created. In addition to the event-based view of the data we have already seen (each row in the `DataFrame` is a detection event) a time-binned view of the data is also created. 

This allows us to acceses either view independently. No need to even think about `self.event_bin_split` anymore.

In [None]:
dets.events_df, dets.bins_df = dets.get_events_bins(dets.detection_df)

In this binned view each row in the `DataFrame` corresponds to a time period where a receiver/transmitter pair was expected to have detections (the length of which is specified in the configuration file). Usually, these bins are large enough that multiple detection events are contained within each bin. 

Since there is often a need for both the event-based and binned views of the data, we provide both. They can be accessed using the following `Detections` attributes: 

In [None]:
# Event-based
dets.events_df

# Time-binned
dets.bins_df

### Discovers the Receiver-Transmitter Groups
The next task in default processing is to create receiver-transmitter groups. This means that each group contains detections pertaining to a single receiver and a single transmitter. 

In [None]:
dets.prepare_rt_groups()

These groups can be used during visualization and analyses to compare behaviour between receiver/transmitter pairs. They are stored in `dets.mdb.rt_groups`. 

Additionally, distances between all receiver & transmitter stations are calculated and stored in `dets.mdb.station_dists_m`. 

In [None]:
dets.mdb.station_dists_m

### Adds Environmental Data
Next, comes the addition of environmental data to the detection DataFrames. By connecting environmental data with detections data you can begin to examine the correlations betweeen environmental conditions like wind, salinity, or bottom type and detection performance. 

#### Environmental data from kadlu
The first environmental data that is loaded during processing comes from the [Kadlu](https://docs.meridian.cs.dal.ca/kadlu/) Python package. Provided you have specified kadlu data variables and sources in your configuration file, data from kadlu is automatically fetched from online data repositories (provided you haven't retrieved that specific data before), save it to a local database, and then interpolated & merged with detection data. 

In [None]:
dets.add_env_data()

#### Custom Environmental Data
Next, environmental data from custom data sources is merged with detections data. This data should be specified under the configuration file's `file_map` key and provided as a compatible filetype (e.g. NetCDF). This data is loaded from file, interpolated, and merged with the detections data. 

In [None]:
dets.add_custom_data()

#### Tidal Data
The third kind of environmental data that is added is tidal data. If a `tidal` entry has been defined in your configuration file than the tidal data will be loaded and tidal height & phase will be interpolated and merged into the detections DataFrame. 

In [None]:
dets.add_tidal_data()

#### Calculated Columns
Finally, any "calculated columns" specified in your configuration file are calculated and merged into the detections data.

In [None]:
dets.add_calculated_columns()

### Prepare Data
Finally, the data is prepared for uses like analysis and visualization. There are two components to this step. 

First, the merged data (detections + environmental data) is split into event-based and binned-based views (similar to what happens in step 2 of processing). These views are saved in `dets.detection_events_df` and `dets.detection_bins_df`. 

In [None]:
# Events
dets.detection_events_df

# Time Bins
dets.detection_bins_df

Second, the merged data is grouped into groups according to receiver/transmitter pairs. These groups are stored in a list, accessible via `dets.rt_group_detections`.

In [None]:
dets.rt_group_detections