Skip to content

Framework for classical machine learning analyses, specialized in MRI data

Notifications You must be signed in to change notification settings

ritterlab/mri_learning

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

machine_learning

Schizophrenia

Repository for analysis of schizophrenia dataset

This projects needs python 3 and the dependencies specified in environment.yml.

To use the classical machine learning pipeline a configuration file needs to be created as a JSON file (details below). To run the analysis execute: python analysis/classical_ml/classical_ml.py cfg_file_path.

The configuration file describes how the analysis will be run. It should be on a JSON format with the following structure:

Attribute Value Remarks
read_mode String that specifies the type of data to be used Value should be one of the following: BIDS, table, h5
paths JSON Object with specification of important paths The paths that can be described here are:

  • data: path to BIDS folder
  • table: path to table with data
  • h5:{training:path_tr, holdout:path_ho}: paths to h5 files
  • atlas: path to atlas for region based analysis
images JSON Object with the description for data loading and preprocessing The possible keys (default value) are:
  • transform: list with functions that will be applied to each image
  • target_transform_map (transform labels to a range of integers) : name of an existing function
  • n_samples (null): Number of samples of the whole dataset to use, uses the entire dataset if null.
  • use_holdout (True): Flag that determines if the data should be split in training test sets.
  • split_seed (0) seed to do the train-test split
  • test_size (null): Fraction of the dataset to use as holdout set. See sklearn.model_selection.train_test_split.
  • label ("label"): Column associated to the data points label in a tabular
  • columns (null): Array with the list of columns to use when read_mode is table . If none, all columns but the label are used.
  • stratify_cols (["label"]) List of columns use to stratify the train test split.
  • mean_normalize (False) Flag to standarize the data
  • minmax_normalize (False) Flag for min-max normalization
  • mask_label (null): Name of a region to extract, ignored if null.
  • use_atlas (False): Flag to use an atlas as a dimensionality reduction.
  • atlas_strategy ('mean'): Aggregation function if use_atlas is true.
training JSON Object with the description for data training The possible keys (default value) are:
  • n_splits (3): Number of folds for cross validation
  • trials (4): Number of independent experiments to perform
  • scorers (['balanced_accuracy']) list of scoring metrics to evaluate the model
  • retrain_metric (balanced_accuracy) Metric from scoring to be used to retrain the model
  • return_train_score (False): Computes and return the train scores in the GridSearchCV
  • models List to specify the models to be used, where each element corresponds to one independent model to analyse. Each element should be a list of strings with the names of estimators to be chained. E.g. [['PCA', 'SVC'], ['DummyClassifier']] means that 2 models will be tested, first a pipeline that performs PCA followed by a support vector classifier, and the second model is just a dummy classifier
  • parameters_lst: Array of JSON objects with as many elements as models were specified. The key-values for ach element should correspond to the name of a hyperparameter mapped to a list of the values to be tried. E.g. [{"PCA__n_components":[5, 20], {"SVC__C:[1, 10]"}, {"DummyClassifier__strategy":["most_frequent"]}]
holdout JSON Object with the description for inference in holdout set (optional) The possible keys (default value) are:
  • scorers (['balanced_accuracy']) list of scoring metrics to evaluate the model
  • best_metric (balanced_accuracy) Metric from training.scorers to be used to select a model
  • model_selection (all_models): strategy to select a model. See model_selection
  • plotting (True): Flag to determine if the results should be saved as a graph
  • voting: Groups multiple selected models into an ensemble through a voting system. Works when model_selection is all_models
  • chosen_estimator: See model_selection
  • chosen_trial: See model_selection
model_selection It can have one of the following values:
  • all_models: Reports mean and variance across trials for all estimators
  • best_model: Selects best overall model
  • best_model_type: Selects the estimator that performed best in average according to best_metric and retrieves the models for every trial
  • best_model_type_random: as best_model_type but selects a random model from all the trials
  • random_model: A random model is selected
  • Selects a model from a random trial for all estimators
  • specific_model: Selects the models from chosen_estimator for the trial chosen_trial

About

Framework for classical machine learning analyses, specialized in MRI data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%