machine_learning

Schizophrenia

Repository for analysis of schizophrenia dataset

This projects needs python 3 and the dependencies specified in environment.yml.

To use the classical machine learning pipeline a configuration file needs to be created as a JSON file (details below). To run the analysis execute: python analysis/classical_ml/classical_ml.py cfg_file_path.

The configuration file describes how the analysis will be run. It should be on a JSON format with the following structure:

Attribute	Value	Remarks
read_mode	String that specifies the type of data to be used	Value should be one of the following: `BIDS`, `table`, `h5`
paths	JSON Object with specification of important paths	The paths that can be described here are: `data`: path to BIDS folder `table`: path to table with data `h5:{training:path_tr, holdout:path_ho}`: paths to h5 files `atlas`: path to atlas for region based analysis
images	JSON Object with the description for data loading and preprocessing	The possible keys (default value) are: `transform`: list with functions that will be applied to each image `target_transform_map (transform labels to a range of integers)` : name of an existing function `n_samples (null)`: Number of samples of the whole dataset to use, uses the entire dataset if null. `use_holdout (True)`: Flag that determines if the data should be split in training test sets. `split_seed (0)` seed to do the train-test split `test_size (null)`: Fraction of the dataset to use as holdout set. See `sklearn.model_selection.train_test_split`. `label ("label")`: Column associated to the data points label in a tabular `columns (null)`: Array with the list of columns to use when `read_mode` is `table` . If none, all columns but the label are used. `stratify_cols (["label"])` List of columns use to stratify the train test split. `mean_normalize (False)` Flag to standarize the data `minmax_normalize (False)` Flag for min-max normalization `mask_label (null)`: Name of a region to extract, ignored if null. `use_atlas (False)`: Flag to use an atlas as a dimensionality reduction. `atlas_strategy ('mean')`: Aggregation function if `use_atlas` is true.
training	JSON Object with the description for data training	The possible keys (default value) are: `n_splits (3)`: Number of folds for cross validation `trials (4)`: Number of independent experiments to perform `scorers (['balanced_accuracy']) list of scoring metrics to evaluate the model` `retrain_metric (balanced_accuracy)` Metric from `scoring` to be used to retrain the model `return_train_score (False)`: Computes and return the train scores in the GridSearchCV `models` List to specify the models to be used, where each element corresponds to one independent model to analyse. Each element should be a list of strings with the names of estimators to be chained. E.g. `[['PCA', 'SVC'], ['DummyClassifier']]` means that 2 models will be tested, first a pipeline that performs PCA followed by a support vector classifier, and the second model is just a dummy classifier `parameters_lst`: Array of JSON objects with as many elements as models were specified. The key-values for ach element should correspond to the name of a hyperparameter mapped to a list of the values to be tried. E.g. `[{"PCA__n_components":[5, 20], {"SVC__C:[1, 10]"}, {"DummyClassifier__strategy":["most_frequent"]}]`
holdout	JSON Object with the description for inference in holdout set (optional)	The possible keys (default value) are: `scorers (['balanced_accuracy']) list of scoring metrics to evaluate the model` `best_metric (balanced_accuracy)` Metric from `training.scorers` to be used to select a model `model_selection (all_models)`: strategy to select a model. See model_selection `plotting (True)`: Flag to determine if the results should be saved as a graph `voting`: Groups multiple selected models into an ensemble through a voting system. Works when `model_selection` is `all_models` `chosen_estimator`: See `model_selection` `chosen_trial`: See `model_selection`
	`model_selection`	It can have one of the following values: `all_models`: Reports mean and variance across trials for all estimators `best_model`: Selects best overall model `best_model_type`: Selects the estimator that performed best in average according to `best_metric` and retrieves the models for every trial `best_model_type_random`: as `best_model_type` but selects a random model from all the trials `random_model`: A random model is selected Selects a model from a random trial for all estimators `specific_model`: Selects the models from `chosen_estimator` for the trial `chosen_trial`

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.github		.github
mri_learning		mri_learning
utils		utils
.gitignore		.gitignore
.travis.yml		.travis.yml
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

machine_learning

Schizophrenia

About

Releases

Packages

Languages

ritterlab/mri_learning

Folders and files

Latest commit

History

Repository files navigation

machine_learning

Schizophrenia

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages