## Step 3: Coregistration

In this step, we are going to connect the selected mineral deposits in Step 2 with the trench sample points we have generated in Step 1. The mineral deposits csv file contains only 5 columns -- index, longitude, latitude, age and plate id. These attributes are not enough for the machine learning analysis. In order to obtain features associated with the deposits, we need to connect these mineral deposits with the trench sample points. We call this process coregistration.

The coregistration method is simple. For a given mineral deposit, the coregistration process will try to find the nearest trench point within a certain region. If found, the subduction convergence kinematics statistics of the trench point will be associated with the mineral deposit. The attributes retrieved from the trench sample points will be used as input data for machine learning models later.

First, let's run the coregistration script and see what will happen. The coregistration script can be configurated via parameters.py, such as the input mineral deposits file, output file name and region of interest, etc.

Relevant parameters in [parameters.py](parameters.py):

* input_file
* output_dir
* regions
* vector_files
* grid_files


In [1]:
from parameters_n1 import parameters
import Utils_c1 as Utils

#let's print out some of the parameters
#you can change the 'input_file' in parameters.py to use different mineral deposits. 
#Remember the files we have created in step 2?
print('The file name of the mineral deposits: ', parameters['coreg_input_files'])
print('The output folder: ', Utils.get_coreg_input_dir())
print('The region of interest(in degree): ', parameters['regions'])
print('The subduction convergence kinematics statistics file name template: ', parameters['vector_files'])
print('\n')

import coregistration_c1 as coregistration
#run the coregistration script
coregistration.run()
#some files should have been created at this point
#let's move to the next cell and check the results

The file name of the mineral deposits:  ['02_NA_Clennett_Positives_PlateID.csv', '02_NA_Clennett_Negatives_1_PlateID.csv', 'deposit_candidates.csv']
The output folder:  test-case-clennett/coreg_input/
The region of interest(in degree):  [5, 10]
The subduction convergence kinematics statistics file name template:  ['{conv_dir}subStats_{time:.2f}.csv']


running coregistration...
['test-case-clennett/coreg_input/02_NA_Clennett_Positives_PlateID.csv', 'test-case-clennett/coreg_input/02_NA_Clennett_Negatives_1_PlateID.csv', 'test-case-clennett/coreg_input/deposit_candidates.csv']
processing test-case-clennett/coreg_input/02_NA_Clennett_Positives_PlateID.csv ***********************************
querying {conv_dir}subStats_{time:.2f}.csv
region of interest: 5
the length of input data is: 272
region of interest: 10
the length of input data is: 82
(272, 5)
(272, 24)
['index', 'lon', 'lat', 'age', 'plate_id', 'recon_lon', 'recon_lat', 'distance', 'sub_idx', 'trench_lon', 'trench_lat', 'conv_rate

In [2]:
from parameters_n2 import parameters
import Utils_c2 as Utils

#let's print out some of the parameters
#you can change the 'input_file' in parameters.py to use different mineral deposits. 
#Remember the files we have created in step 2?
print('The file name of the mineral deposits: ', parameters['coreg_input_files'])
print('The output folder: ', Utils.get_coreg_input_dir())
print('The region of interest(in degree): ', parameters['regions'])
print('The subduction convergence kinematics statistics file name template: ', parameters['vector_files'])
print('\n')

import coregistration_c2 as coregistration
#run the coregistration script
coregistration.run()
#some files should have been created at this point
#let's move to the next cell and check the results

The file name of the mineral deposits:  ['02_NA_Clennett_Positives_PlateID.csv', '02_NA_Clennett_Negatives_2_PlateID.csv', 'deposit_candidates.csv']
The output folder:  test-case-clennett/coreg_input/
The region of interest(in degree):  [5, 10]
The subduction convergence kinematics statistics file name template:  ['{conv_dir}subStats_{time:.2f}.csv']


running coregistration...
['test-case-clennett/coreg_input/02_NA_Clennett_Positives_PlateID.csv', 'test-case-clennett/coreg_input/02_NA_Clennett_Negatives_2_PlateID.csv', 'test-case-clennett/coreg_input/deposit_candidates.csv']
processing test-case-clennett/coreg_input/02_NA_Clennett_Positives_PlateID.csv ***********************************
querying {conv_dir}subStats_{time:.2f}.csv
region of interest: 5
the length of input data is: 272
region of interest: 10
the length of input data is: 82
(272, 5)
(272, 24)
['index', 'lon', 'lat', 'age', 'plate_id', 'recon_lon', 'recon_lat', 'distance', 'sub_idx', 'trench_lon', 'trench_lat', 'conv_rate

In [None]:
from parameters_n3 import parameters
import Utils_c3 as Utils

#let's print out some of the parameters
#you can change the 'input_file' in parameters.py to use different mineral deposits. 
#Remember the files we have created in step 2?
print('The file name of the mineral deposits: ', parameters['coreg_input_files'])
print('The output folder: ', Utils.get_coreg_input_dir())
print('The region of interest(in degree): ', parameters['regions'])
print('The subduction convergence kinematics statistics file name template: ', parameters['vector_files'])
print('\n')

import coregistration_c3 as coregistration
#run the coregistration script
coregistration.run()
#some files should have been created at this point
#let's move to the next cell and check the results

In [None]:
from parameters_n4 import parameters
import Utils_c4 as Utils

#let's print out some of the parameters
#you can change the 'input_file' in parameters.py to use different mineral deposits. 
#Remember the files we have created in step 2?
print('The file name of the mineral deposits: ', parameters['coreg_input_files'])
print('The output folder: ', Utils.get_coreg_input_dir())
print('The region of interest(in degree): ', parameters['regions'])
print('The subduction convergence kinematics statistics file name template: ', parameters['vector_files'])
print('\n')

import coregistration_c4 as coregistration
#run the coregistration script
coregistration.run()
#some files should have been created at this point
#let's move to the next cell and check the results

In [None]:
from parameters_n5 import parameters
import Utils_c5 as Utils

#let's print out some of the parameters
#you can change the 'input_file' in parameters.py to use different mineral deposits. 
#Remember the files we have created in step 2?
print('The file name of the mineral deposits: ', parameters['coreg_input_files'])
print('The output folder: ', Utils.get_coreg_input_dir())
print('The region of interest(in degree): ', parameters['regions'])
print('The subduction convergence kinematics statistics file name template: ', parameters['vector_files'])
print('\n')

import coregistration_c5 as coregistration
#run the coregistration script
coregistration.run()
#some files should have been created at this point
#let's move to the next cell and check the results

In [None]:
import pandas as pd
import Utils

#read in the coregistration output file
data = pd.read_csv(Utils.get_coreg_output_dir() + "positive_deposits_c1.csv") 
display(data.head())#let's print the first 5 rows

#print(data.columns)
#print('\nThe meaning of the columns: \n')
Utils.print_columns()

input_data = pd.read_csv(Utils.get_coreg_input_dir() + "02_NA_Clennett_Positives_PlateID.csv")
display(input_data)

#the input data and output data has the same length
print('The shape of the output data: ', data.shape)
print('The shape of the input data: ',input_data.shape)

We can see in above code cell that the input data and output data has the same length. It means, for each input mineral deposit, there is one corresponding data row in the output file. 

The coregistration program takes the mineral deposit coordinates and uses age and plate id to reconstruct the deposits back in time. And then the program searches the nearby subduction trench, if found, copy the subduction convergence kinematics statistics.