This notebook goes through a simple binary classification example, explaining library functionality along the way.

In [1]:
import ABCD_ML

In [2]:
#Define directory with the 2.0_NDA_Data
nda_dr = '/mnt/sdb2/2.0_ABCD_Data_Explorer/2.0_NDA_Data/'

#This file stores the name mapping
test_mapping_loc = nda_dr + 'ABCD_Release_ Notes_Data_Release_ 2.0/22. ABCD_Release_2.0_mapping_r.csv'

#We will use as the neuroimaging data just the sMRI data
test_data_loc1 = nda_dr + 'MRI/ABCD sMRI Part 1.csv'
test_data_loc2 = nda_dr + 'MRI/ABCD sMRI Part 2.csv'

#We will load target data (and covariate data) from here
test_target_loc = nda_dr + 'Mental Health/ABCD Parent Demographics Survey.csv'

#We will load stratification data from here
test_strat_loc = nda_dr + 'Other Non-Imaging/ABCD ACS Post Stratification Weights.csv'

We first need to define the class object, which we will use to load load and to train/test different ML models.
There are a few global parameters which we can optionally set when defining this object as well, lets look and see what they are.

In [3]:
help(ABCD_ML.ABCD_ML.__init__)

Help on function __init__ in module ABCD_ML.ABCD_ML:

__init__(self, eventname='baseline_year_1_arm_1', use_default_subject_ids=True, default_na_values=['777', '999'], n_jobs=1, original_targets_key='targets', verbose=True)
    Main class init
    
    Parameters
    ----------
    eventname : str or None, optional
        Optional value to provide, specifying to keep certain rows
        when reading data based on the eventname flag.
        As ABCD is a longitudinal study, this flag lets you select only
        one specific time point, or if set to None, will load everything.
        (default = baseline_year_1_arm_1)
    
    use_default_subject_ids : bool, optional
        Flag to determine the usage of 'default' subject id behavior.
        If set to True, this will convert input NDAR subject ids
        into upper case, with prepended NDAR_ - type format.
        If set to False, then all input subject names must be entered
        explicitly the same, no preprocessing will be don

Most of the default parameters are okay for this simple example, but any of them can be changed. Let's change n_jobs to 4 instead of 1.

In [4]:
ML = ABCD_ML.ABCD_ML(n_jobs = 4)

ABCD_ML object initialized


We can continue by optionally loading in a name map, which is simply a dictionary that attempts to rename any column names loaded in, if those column names are a key in the dictionary. This is useful for ABCD data as the default column names might not be useful.

In [5]:
ML.load_name_mapping(loc = test_mapping_loc,
                     source_name_col="NDAR name",
                     target_name_col="REDCap name/NDA alias")

Loaded map file


We can look at what exactly is in this dictionary if we want to confirm we loaded it correctly.