Skip to content

Commit

Permalink
rename workflow file and add workflow test
Browse files Browse the repository at this point in the history
  • Loading branch information
anuprulez committed May 21, 2024
2 parents 502f6e6 + e6df909 commit 6439ff1
Show file tree
Hide file tree
Showing 4 changed files with 630 additions and 669 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -436,7 +436,7 @@ The visualization tool creates the following ROC plot:

# Create data processing pipeline

At the last step, we will create a bagging classifier by using the **Pipeline builder** tool. Bagging or Bootstrap Aggregating is a widely used ensemble learning algorithm in machine learning. The bagging algorithm creates multiple models from randomly taken subsets of the training dataset and then aggregates learners to build overall stronger classifiers that combine the predictions to produce a final prediction. The **Pipeline builder** tool builds the classifier and returns a zipped file. This tool creates another file which is tabular and contains a list of all the different hyperparameters of the preprocessors and estimators. This tabular file will be used in the **Hyperparameter search** tool to populate the list of hyperparameters with their respective (default) values.
At the last step, we will create a bagging classifier by using the **Pipeline builder** tool. Bagging or Bootstrap Aggregating is a widely used ensemble learning algorithm in machine learning. The bagging algorithm creates multiple models from randomly taken subsets of the training dataset and then aggregates learners to build overall stronger classifiers that combine the predictions to produce a final prediction. The **Pipeline builder** tool builds the classifier and returns a `h5mlm` file. This tool creates another file which is tabular and contains a list of all the different hyperparameters of the preprocessors and estimators. This tabular file will be used in the **Hyperparameter search** tool to populate the list of hyperparameters with their respective (default) values.

> <hands-on-title>Create pipeline</hands-on-title>
>
Expand Down Expand Up @@ -499,7 +499,7 @@ Using the **Hyperparameter search** tool, we found the best model, based on the
>
> 1. {% tool [Ensemble methods for classification and regression](toolshed.g2.bx.psu.edu/repos/bgruening/sklearn_ensemble/sklearn_ensemble/1.0.11.0) %}:
> - *"Select a Classification Task"*: `Load a model and predict`
> - {% icon param-files %} *"Models"*: `zipped` file (output of **Hyperparameter search** {% icon tool %})
> - {% icon param-files %} *"Models"*: `h5mlm` file (output of **Hyperparameter search** {% icon tool %})
> - {% icon param-files %} *"Data (tabular)"*: `test_rows` tabular file
> - *"Does the dataset contain header"*: `Yes`
>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@
"name": "Input dataset",
"outputs": [],
"position": {
"left": 0,
"left": 0.0,
"top": 630.9083691436052
},
"tool_id": null,
Expand Down Expand Up @@ -240,6 +240,7 @@
],
"position": {
"left": 228,
"left": 228.0,
"top": 770.8979524769386
},
"post_job_actions": {},
Expand Down Expand Up @@ -386,7 +387,7 @@
"owner": "bgruening",
"tool_shed": "toolshed.g2.bx.psu.edu"
},
"tool_state": "{\"infile_estimator\": {\"__class__\": \"ConnectedValue\"}, \"input_options\": {\"selected_input\": \"tabular\", \"__current_case__\": 0, \"infile1\": {\"__class__\": \"ConnectedValue\"}, \"header1\": true, \"column_selector_options_1\": {\"selected_column_selector_option\": \"all_but_by_header_name\", \"__current_case__\": 3, \"col1\": \"Class\"}, \"infile2\": {\"__class__\": \"ConnectedValue\"}, \"header2\": true, \"column_selector_options_2\": {\"selected_column_selector_option2\": \"by_header_name\", \"__current_case__\": 2, \"col2\": \"Class\"}}, \"is_deep_learning\": false, \"options\": {\"scoring\": {\"primary_scoring\": \"default\", \"__current_case__\": 0}, \"cv_selector\": {\"selected_cv\": \"default\", \"__current_case__\": 0, \"n_splits\": \"5\"}, \"verbose\": \"0\", \"error_score\": true, \"return_train_score\": false}, \"outer_split\": {\"split_mode\": \"no\", \"__current_case__\": 0}, \"save\": \"save_estimator\", \"search_algos\": {\"selected_search_algo\": \"GridSearchCV\", \"__current_case__\": 0}, \"search_params_builder\": {\"param_set\": [{\"__index__\": 0, \"sp_name\": null, \"sp_list\": \"\"}]}, \"__page__\": null, \"__rerun_remap_job_id__\": null}",
"tool_state": "{\"infile_estimator\": {\"__class__\": \"RuntimeValue\"}, \"input_options\": {\"selected_input\": \"tabular\", \"__current_case__\": 0, \"infile1\": {\"__class__\": \"RuntimeValue\"}, \"header1\": true, \"column_selector_options_1\": {\"selected_column_selector_option\": \"all_but_by_header_name\", \"__current_case__\": 3, \"col1\": \"Class\"}, \"infile2\": {\"__class__\": \"RuntimeValue\"}, \"header2\": true, \"column_selector_options_2\": {\"selected_column_selector_option2\": \"by_header_name\", \"__current_case__\": 2, \"col2\": \"Class\"}}, \"is_deep_learning\": false, \"options\": {\"scoring\": {\"primary_scoring\": \"default\", \"__current_case__\": 0}, \"cv_selector\": {\"selected_cv\": \"default\", \"__current_case__\": 0, \"n_splits\": \"5\"}, \"verbose\": \"0\", \"error_score\": true, \"return_train_score\": false}, \"outer_split\": {\"split_mode\": \"no\", \"__current_case__\": 0}, \"save\": \"save_estimator\", \"search_algos\": {\"selected_search_algo\": \"GridSearchCV\", \"__current_case__\": 0}, \"search_params_builder\": {\"param_set\": [{\"__index__\": 0, \"sp_name\": null, \"sp_list\": \"\"}]}, \"__page__\": null, \"__rerun_remap_job_id__\": null}",
"tool_version": "1.0.11.0",
"type": "tool",
"uuid": "b5cd9178-e9d7-4e7c-b5df-d29a351328f9",
Expand Down Expand Up @@ -531,7 +532,7 @@
}
],
"position": {
"left": 480,
"left": 480.0,
"top": 676.2416815128128
},
"post_job_actions": {},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -334,10 +334,9 @@ After the **New Pipeline/Estimator** dataset and its tunable hyperparameters are
>
> {% tool [Hyperparameter search](toolshed.g2.bx.psu.edu/repos/bgruening/sklearn_searchcv/sklearn_searchcv/1.0.11.0) %}:
> - *"Select a model selection search scheme"*: `GridSearchCV - Exhaustive search over specified parameter values for an estimator `
> - {% icon param-files %} *"Choose the dataset containing pipeline/estimator object"*: `zipped` file (output of **Pipeline builder** {% icon tool %})
> - {% icon param-files %} *"Choose the dataset containing pipeline/estimator object"*: `h5mlm` file (output of **Pipeline builder** {% icon tool %})
> - *"Is the estimator a deep learning model?"*: `No`
> - In *"Search parameters Builder"*:
> - {% icon param-files %} *"Choose the dataset containing parameter names"*: `tabular` file (the other output of **Pipeline builder** {% icon tool %})
> - In *"Parameter settings for search"*:
> - {% icon param-repeat %} *"1: Parameter settings for search"*
> - *"Choose a parameter name (with current value)"*: `n_estimators: 100`
Expand Down Expand Up @@ -395,7 +394,7 @@ Using the **Hyperparameter search** tool, we optimized our model, based on the t
>
> 1. {% tool [Ensemble methods for classification and regression](toolshed.g2.bx.psu.edu/repos/bgruening/sklearn_ensemble/sklearn_ensemble/1.0.11.0) %}:
> - *"Select a Classification Task"*: `Load a model and predict`
> - {% icon param-files %} *"Models"*: `zipped` file (output of **Hyperparameter search** {% icon tool %})
> - {% icon param-files %} *"Models"*: `h5mlm` file (output of **Hyperparameter search** {% icon tool %})
> - {% icon param-files %} *"Data (tabular)"*: `test_rows` tabular file
> - *"Does the dataset contain header"*: `Yes`
>
Expand Down
Loading

0 comments on commit 6439ff1

Please sign in to comment.