From fe3c0154880c54936d656de9235b6797f47a121f Mon Sep 17 00:00:00 2001 From: Jake Crawford Date: Mon, 12 Jun 2023 13:55:44 -0400 Subject: [PATCH] one sentence per line --- 01_stratified_classification/README.md | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/01_stratified_classification/README.md b/01_stratified_classification/README.md index da83e16..3edb2f3 100644 --- a/01_stratified_classification/README.md +++ b/01_stratified_classification/README.md @@ -28,14 +28,20 @@ The parent directory README also contains instructions for running tests to ensu ## Running experiments -To train classifiers and generate the primary results files for the optimization comparison (e.g. classification metrics, best model coefficients, loss function curves), run the `run_stratified_lasso_penalty.py` script. By default, this will use the `liblinear` (coordinate descent) optimizer, unless the `--sgd` flag is included in which case it will use SGD. +To train classifiers and generate the primary results files for the optimization comparison (e.g. classification metrics, best model coefficients, loss function curves), run the `run_stratified_lasso_penalty.py` script. +By default, this will use the `liblinear` (coordinate descent) optimizer, unless the `--sgd` flag is included in which case it will use SGD. -If the `--sgd` flag is included, the `--sgd_lr_schedule` argument can be used to select the learning rate schedule. The default is `optimal` (this is the scikit-learn default), but most experiments in the paper use the `constant_search` option. Other options are described [here](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html), under the `learning_rate` function argument. +If the `--sgd` flag is included, the `--sgd_lr_schedule` argument can be used to select the learning rate schedule. +The default is `optimal` (this is the scikit-learn default), but most experiments in the paper use the `constant_search` option. +Other options are described [here](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html), under the `learning_rate` function argument. -The `--num_features` argument can be used for feature selection. This will default to 8000 features selected by median absolute deviation. For the experiments in the paper we used 16042 features, which is all of the features in the preprocessed TCGA gene expression dataset. +The `--num_features` argument can be used for feature selection. +This will default to 8000 features selected by median absolute deviation. +For the experiments in the paper we used 16042 features, which is all of the features in the preprocessed TCGA gene expression dataset. The script will write output to the `--results_dir` directory, which defaults to `01_stratified_classification/results`. ## Analysis and visualization of results -The Jupyter notebooks in this directory can be used to visualize the results generated by `run_stratified_lasso_penalty.py` and ultimately to generate the figures in the paper, as described above in the "repository layout" section. Each of these scripts has a `results_dir` variable (or multiple variables) defined near the top, which can either be set manually or modified programmatically using [papermill commmand line arguments](https://papermill.readthedocs.io/en/latest/). +The Jupyter notebooks in this directory can be used to visualize the results generated by `run_stratified_lasso_penalty.py` and ultimately to generate the figures in the paper, as described above in the "repository layout" section. +Each of these scripts has a `results_dir` variable (or multiple variables) defined near the top, which can either be set manually or modified programmatically using [papermill commmand line arguments](https://papermill.readthedocs.io/en/latest/).