Skip to content

Commit

Permalink
one sentence per line
Browse files Browse the repository at this point in the history
  • Loading branch information
jjc2718 committed Jun 12, 2023
1 parent 131ae24 commit fe3c015
Showing 1 changed file with 10 additions and 4 deletions.
14 changes: 10 additions & 4 deletions 01_stratified_classification/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,14 +28,20 @@ The parent directory README also contains instructions for running tests to ensu

## Running experiments

To train classifiers and generate the primary results files for the optimization comparison (e.g. classification metrics, best model coefficients, loss function curves), run the `run_stratified_lasso_penalty.py` script. By default, this will use the `liblinear` (coordinate descent) optimizer, unless the `--sgd` flag is included in which case it will use SGD.
To train classifiers and generate the primary results files for the optimization comparison (e.g. classification metrics, best model coefficients, loss function curves), run the `run_stratified_lasso_penalty.py` script.
By default, this will use the `liblinear` (coordinate descent) optimizer, unless the `--sgd` flag is included in which case it will use SGD.

If the `--sgd` flag is included, the `--sgd_lr_schedule` argument can be used to select the learning rate schedule. The default is `optimal` (this is the scikit-learn default), but most experiments in the paper use the `constant_search` option. Other options are described [here](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html), under the `learning_rate` function argument.
If the `--sgd` flag is included, the `--sgd_lr_schedule` argument can be used to select the learning rate schedule.
The default is `optimal` (this is the scikit-learn default), but most experiments in the paper use the `constant_search` option.
Other options are described [here](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html), under the `learning_rate` function argument.

The `--num_features` argument can be used for feature selection. This will default to 8000 features selected by median absolute deviation. For the experiments in the paper we used 16042 features, which is all of the features in the preprocessed TCGA gene expression dataset.
The `--num_features` argument can be used for feature selection.
This will default to 8000 features selected by median absolute deviation.
For the experiments in the paper we used 16042 features, which is all of the features in the preprocessed TCGA gene expression dataset.

The script will write output to the `--results_dir` directory, which defaults to `01_stratified_classification/results`.

## Analysis and visualization of results

The Jupyter notebooks in this directory can be used to visualize the results generated by `run_stratified_lasso_penalty.py` and ultimately to generate the figures in the paper, as described above in the "repository layout" section. Each of these scripts has a `results_dir` variable (or multiple variables) defined near the top, which can either be set manually or modified programmatically using [papermill commmand line arguments](https://papermill.readthedocs.io/en/latest/).
The Jupyter notebooks in this directory can be used to visualize the results generated by `run_stratified_lasso_penalty.py` and ultimately to generate the figures in the paper, as described above in the "repository layout" section.
Each of these scripts has a `results_dir` variable (or multiple variables) defined near the top, which can either be set manually or modified programmatically using [papermill commmand line arguments](https://papermill.readthedocs.io/en/latest/).

0 comments on commit fe3c015

Please sign in to comment.