rename workflow file and add workflow test

galaxyproject · May 21, 2024 · 6439ff1 · 6439ff1
2 parents 502f6e6 + e6df909
commit 6439ff1
Show file tree

Hide file tree

Showing 4 changed files with 630 additions and 669 deletions.
diff --git a/topics/statistics/tutorials/classification_machinelearning/tutorial.md b/topics/statistics/tutorials/classification_machinelearning/tutorial.md
@@ -436,7 +436,7 @@ The visualization tool creates the following ROC plot:
 
 # Create data processing pipeline
 
-At the last step, we will create a bagging classifier by using  the **Pipeline builder** tool. Bagging or Bootstrap Aggregating is a widely used ensemble learning algorithm in machine learning. The bagging algorithm creates multiple models from randomly taken subsets of the training dataset and then aggregates learners to build overall stronger classifiers that combine the predictions to produce a final prediction. The **Pipeline builder** tool builds the classifier and returns a zipped file. This tool creates another file which is tabular and contains a list of all the different hyperparameters of the preprocessors and estimators. This tabular file will be used in the **Hyperparameter search** tool to populate the list of hyperparameters with their respective (default) values.
+At the last step, we will create a bagging classifier by using  the **Pipeline builder** tool. Bagging or Bootstrap Aggregating is a widely used ensemble learning algorithm in machine learning. The bagging algorithm creates multiple models from randomly taken subsets of the training dataset and then aggregates learners to build overall stronger classifiers that combine the predictions to produce a final prediction. The **Pipeline builder** tool builds the classifier and returns a `h5mlm` file. This tool creates another file which is tabular and contains a list of all the different hyperparameters of the preprocessors and estimators. This tabular file will be used in the **Hyperparameter search** tool to populate the list of hyperparameters with their respective (default) values.
 
 > <hands-on-title>Create pipeline</hands-on-title>
 >
@@ -499,7 +499,7 @@ Using the **Hyperparameter search** tool, we found the best model, based on the
 >
 > 1. {% tool [Ensemble methods for classification and regression](toolshed.g2.bx.psu.edu/repos/bgruening/sklearn_ensemble/sklearn_ensemble/1.0.11.0) %}:
 >    - *"Select a Classification Task"*: `Load a model and predict`
->        - {% icon param-files %} *"Models"*: `zipped` file (output of **Hyperparameter search** {% icon tool %})
+>        - {% icon param-files %} *"Models"*: `h5mlm` file (output of **Hyperparameter search** {% icon tool %})
 >        - {% icon param-files %} *"Data (tabular)"*: `test_rows` tabular file
 >        - *"Does the dataset contain header"*: `Yes`
 >

diff --git a/topics/statistics/tutorials/classification_machinelearning/workflows/ml_classification.ga b/topics/statistics/tutorials/classification_machinelearning/workflows/ml_classification.ga
@@ -83,7 +83,7 @@
             "name": "Input dataset",
             "outputs": [],
             "position": {
-                "left": 0,
+                "left": 0.0,
                 "top": 630.9083691436052
             },
             "tool_id": null,
@@ -240,6 +240,7 @@
             ],
             "position": {
                 "left": 228,
+                "left": 228.0,
                 "top": 770.8979524769386
             },
             "post_job_actions": {},
@@ -386,7 +387,7 @@
                 "owner": "bgruening",
                 "tool_shed": "toolshed.g2.bx.psu.edu"
             },
-            "tool_state": "{\"infile_estimator\": {\"__class__\": \"ConnectedValue\"}, \"input_options\": {\"selected_input\": \"tabular\", \"__current_case__\": 0, \"infile1\": {\"__class__\": \"ConnectedValue\"}, \"header1\": true, \"column_selector_options_1\": {\"selected_column_selector_option\": \"all_but_by_header_name\", \"__current_case__\": 3, \"col1\": \"Class\"}, \"infile2\": {\"__class__\": \"ConnectedValue\"}, \"header2\": true, \"column_selector_options_2\": {\"selected_column_selector_option2\": \"by_header_name\", \"__current_case__\": 2, \"col2\": \"Class\"}}, \"is_deep_learning\": false, \"options\": {\"scoring\": {\"primary_scoring\": \"default\", \"__current_case__\": 0}, \"cv_selector\": {\"selected_cv\": \"default\", \"__current_case__\": 0, \"n_splits\": \"5\"}, \"verbose\": \"0\", \"error_score\": true, \"return_train_score\": false}, \"outer_split\": {\"split_mode\": \"no\", \"__current_case__\": 0}, \"save\": \"save_estimator\", \"search_algos\": {\"selected_search_algo\": \"GridSearchCV\", \"__current_case__\": 0}, \"search_params_builder\": {\"param_set\": [{\"__index__\": 0, \"sp_name\": null, \"sp_list\": \"\"}]}, \"__page__\": null, \"__rerun_remap_job_id__\": null}",
+            "tool_state": "{\"infile_estimator\": {\"__class__\": \"RuntimeValue\"}, \"input_options\": {\"selected_input\": \"tabular\", \"__current_case__\": 0, \"infile1\": {\"__class__\": \"RuntimeValue\"}, \"header1\": true, \"column_selector_options_1\": {\"selected_column_selector_option\": \"all_but_by_header_name\", \"__current_case__\": 3, \"col1\": \"Class\"}, \"infile2\": {\"__class__\": \"RuntimeValue\"}, \"header2\": true, \"column_selector_options_2\": {\"selected_column_selector_option2\": \"by_header_name\", \"__current_case__\": 2, \"col2\": \"Class\"}}, \"is_deep_learning\": false, \"options\": {\"scoring\": {\"primary_scoring\": \"default\", \"__current_case__\": 0}, \"cv_selector\": {\"selected_cv\": \"default\", \"__current_case__\": 0, \"n_splits\": \"5\"}, \"verbose\": \"0\", \"error_score\": true, \"return_train_score\": false}, \"outer_split\": {\"split_mode\": \"no\", \"__current_case__\": 0}, \"save\": \"save_estimator\", \"search_algos\": {\"selected_search_algo\": \"GridSearchCV\", \"__current_case__\": 0}, \"search_params_builder\": {\"param_set\": [{\"__index__\": 0, \"sp_name\": null, \"sp_list\": \"\"}]}, \"__page__\": null, \"__rerun_remap_job_id__\": null}",
             "tool_version": "1.0.11.0",
             "type": "tool",
             "uuid": "b5cd9178-e9d7-4e7c-b5df-d29a351328f9",
@@ -531,7 +532,7 @@
                 }
             ],
             "position": {
-                "left": 480,
+                "left": 480.0,
                 "top": 676.2416815128128
             },
             "post_job_actions": {},

diff --git a/topics/statistics/tutorials/regression_machinelearning/tutorial.md b/topics/statistics/tutorials/regression_machinelearning/tutorial.md
@@ -334,10 +334,9 @@ After the **New Pipeline/Estimator** dataset and its tunable hyperparameters are
 >
 > {% tool [Hyperparameter search](toolshed.g2.bx.psu.edu/repos/bgruening/sklearn_searchcv/sklearn_searchcv/1.0.11.0) %}:
 >    - *"Select a model selection search scheme"*: `GridSearchCV - Exhaustive search over specified parameter values for an estimator `
->        - {% icon param-files %} *"Choose the dataset containing pipeline/estimator object"*: `zipped` file (output of **Pipeline builder** {% icon tool %})
+>        - {% icon param-files %} *"Choose the dataset containing pipeline/estimator object"*: `h5mlm` file (output of **Pipeline builder** {% icon tool %})
 >        - *"Is the estimator a deep learning model?"*: `No`
 >        - In *"Search parameters Builder"*:
->             - {% icon param-files %} *"Choose the dataset containing parameter names"*: `tabular` file (the other output of **Pipeline builder** {% icon tool %})
 >             - In *"Parameter settings for search"*:
 >                 - {% icon param-repeat %} *"1: Parameter settings for search"*
 >                    - *"Choose a parameter name (with current value)"*: `n_estimators: 100`
@@ -395,7 +394,7 @@ Using the **Hyperparameter search** tool, we optimized our model, based on the t
 >
 > 1. {% tool [Ensemble methods for classification and regression](toolshed.g2.bx.psu.edu/repos/bgruening/sklearn_ensemble/sklearn_ensemble/1.0.11.0) %}:
 >    - *"Select a Classification Task"*: `Load a model and predict`
->        - {% icon param-files %} *"Models"*: `zipped` file (output of **Hyperparameter search** {% icon tool %})
+>        - {% icon param-files %} *"Models"*: `h5mlm` file (output of **Hyperparameter search** {% icon tool %})
 >        - {% icon param-files %} *"Data (tabular)"*: `test_rows` tabular file
 >        - *"Does the dataset contain header"*: `Yes`
 >