Skip to content

Commit e506ea6

Browse files
authored
Merge branch 'main' into yuanchi2807-patch-1
2 parents 5b5bcc0 + bccf43e commit e506ea6

File tree

7 files changed

+2359
-1923
lines changed

7 files changed

+2359
-1923
lines changed

deploy/ibm_cloud_code_engine/README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,8 @@ export NAMESPACE=<namespace from above>
5151

5252
Update With the following command you can download a basic Ray cluster definition and customize it for your namespace:
5353
```shell
54-
sed "s/NAMESPACE/$NAMESPACE/" > ./example-cluster.yaml
54+
cd ./deploy/ibm_cloud_code_engine/
55+
sed "s/NAMESPACE/$NAMESPACE/" ./example-cluster.yaml.template > ./example-cluster.yaml
5556
```
5657

5758
This reference deployment file will create a Ray cluster with following characteristics:

deploy/ibm_cloud_code_engine/example-cluster.yaml renamed to deploy/ibm_cloud_code_engine/example-cluster.yaml.template

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ provider:
2626
use_internal_ips: true
2727

2828
# Namespace to use for all resources created.
29-
namespace:
29+
namespace: NAMESPACE
3030

3131
services:
3232
# Service that maps to the head node of the Ray cluster.
@@ -120,7 +120,7 @@ available_node_types:
120120
spec:
121121
# Change this if you altered the autoscaler_service_account above
122122
# or want to provide your own.
123-
serviceAccountName: ap9fjwkf04j-writer
123+
serviceAccountName: NAMESPACE-writer
124124

125125
restartPolicy: Never
126126

docs/source/examples/hyperparameter.md

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -18,10 +18,10 @@ limitations under the License.
1818

1919
### Tuning hyper-parameters with CodeFlare Pipelines
2020

21-
GridSearchCV() is often used for hyper-parameter turning for a model constructed via sklearn pipelines. It does an exhaustive search over specified parameter values for a pipeline. It implements a `fit()` method and a `score()` method. The parameters of the pipeline used to apply these methods are optimized by cross-validated grid-search over a parameter grid.
21+
`GridSearchCV()` is often used for hyper-parameter turning for a model constructed via sklearn pipelines. It does an exhaustive search over specified parameter values for a pipeline. It implements a `fit()` method and a `score()` method. The parameters of the pipeline used to apply these methods are optimized by cross-validated grid-search over a parameter grid.
2222
Here we show how to convert an example of using `GridSearchCV()` to tune the hyper-parameters of an sklearn pipeline into one that uses Codeflare (CF) pipelines `grid_search_cv()`. We use the [Pipelining: chaining a PCA and a logistic regression](https://scikit-learn.org/stable/auto_examples/compose/plot_digits_pipe.html#sphx-glr-auto-examples-compose-plot-digits-pipe-py) from sklearn pipelines as an example. 
2323

24-
In this sklearn example, a pipeline is chained together with a PCA and a LogisticRegression. The n_components parameter of the PCA and the C parameter of the LogisticRegression are defined in a param_grid: with n_components in `[5, 15, 30, 45, 64]` and `C` defined by `np.logspace(-4, 4, 4)`. A total of 20 combinations of `n_components` and `C` parameter values will be explored by `GridSearchCV()` to find the best one with the highest `mean_test_score`.
24+
In this sklearn example, a pipeline is chained together with a PCA and a LogisticRegression. The `n_components` parameter of the PCA and the `C` parameter of the LogisticRegression are defined in a `param_grid`: with `n_components` in `[5, 15, 30, 45, 64]` and `C` defined by `np.logspace(-4, 4, 4)`. A total of 20 combinations of `n_components` and `C` parameter values will be explored by `GridSearchCV()` to find the best one with the highest `mean_test_score`.
2525

2626
```python
2727
pca = PCA()
@@ -40,14 +40,14 @@ print("Best parameter (CV score=%0.3f):" % search.best_score_)
4040
print(search.best_params_)
4141
```
4242

43-
After running `GridSearchCV().fit()`, the best parameters of `PCA__n_components` and `LogisticRegression__C`, together with the cross-validated mean_test scores are printed out as follows. In this example, the best n_components chosen is 45 for the PCA.
43+
After running `GridSearchCV().fit()`, the best parameters of `PCA__n_components` and `LogisticRegression__C`, together with the cross-validated `mean_test scores` are printed out as follows. In this example, the best `n_components` chosen is 45 for the PCA.
4444

4545
```python
4646
Best parameter (CV score=0.920):
4747
{'logistic__C': 0.046415888336127774, 'pca__n_components': 45}
4848
```
4949

50-
The PCA explained variance ratio and the best n_components chosen are plotted in the top chart. The classification accuracy and its std_test_score are plotted in the bottom chart. The best n_components can be obtained by calling best_estimator_.named_step['pca'].n_components from the returned object of GridSearchCV().
50+
The PCA explained variance ratio and the best `n_components` chosen are plotted in the top chart. The classification accuracy and its `std_test_score` are plotted in the bottom chart. The best `n_components` can be obtained by calling `best_estimator_.named_step['pca'].n_components` from the returned object of `GridSearchCV()`.
5151

5252
![](../images/pca_1.png)
5353

@@ -58,7 +58,7 @@ We next describe the step-by-step conversion of this example to one that uses Co
5858

5959
#### **Step 1: importing codeflare.pipelines packages and ray**
6060

61-
We need to first import various `codeflare.pipelines` packages, including Datamodel and runtime, as well as ray and call `ray.shutdwon()` and `ray.init()`. Note that, in order to run this CodeFlare example notebook, you need to have a running ray instance.
61+
We need to first import various `codeflare.pipelines` packages, including `Datamodel` and `runtime`, as well as `ray` and call `ray.shutdwon()` and `ray.init()`. Note that, in order to run this CodeFlare example notebook, you need to have a running ray instance.
6262

6363
```python
6464
import codeflare.pipelines.Datamodel as dm
@@ -73,7 +73,7 @@ ray.init()
7373

7474
#### **Step 2: defining and setting up a codeflare pipeline**
7575

76-
A codeflare pipeline is defined by EstimatorNodes and edges connecting two EstimatorNodes. In this case, we define node_pca and node_logistic and we connect these two nodes with `pipeline.add_edge()`. Before we can execute `fit()` on a pipeline, we need to set up the proper input to the pipeline.
76+
A codeflare pipeline is defined by `EstimatorNodes` and `edges` connecting two `EstimatorNodes`. In this case, we define `node_pca` and `node_logistic` and we connect these two nodes with `pipeline.add_edge()`. Before we can execute `fit()` on a pipeline, we need to set up the proper input to the pipeline.
7777

7878
```python
7979
pca = PCA()
@@ -88,10 +88,9 @@ pipeline_input = dm.PipelineInput()
8888
pipeline_input.add_xy_arg(node_pca, dm.Xy(X_digits, y_digits))
8989
```
9090

91-
#### **Step 3: defining pipeline param grid and executing**
91+
#### **Step 3: defining pipeline param grid and executing Codeflare pipelines `grid_search_cv()`**
9292

93-
Codeflare pipelines grid_search_cv()
94-
Codeflare pipelines runtime converts an sklearn param_grid into a codeflare pipelines param grid. We also specify the default KFold parameter for running the cross-validation. Finally, Codeflare pipelines runtime executes the grid_search_cv().
93+
Codeflare pipelines runtime converts an sklearn param_grid into a codeflare pipelines param grid. We also specify the default `KFold` parameter for running the cross-validation. Finally, Codeflare pipelines runtime executes the `grid_search_cv()`.
9594

9695
```python
9796
# param_grid
@@ -112,7 +111,7 @@ result = rt.grid_search_cv(kf, pipeline, pipeline_input, pipeline_param)
112111

113112
#### **Step 4: parsing the returned result from `grid_search_cv()`**
114113

115-
As the Codeflare pipelines project is still actively under development, APIs to access some attributes of the explored pipelines in the `grid_search_cv()` are not yet available. As a result, a slightly more verbose code is used to get the best pipeline, its associated parameter values and other statistics from the returned object of `grid_search_cv()`. For example, we need to loop through all the 20 explored pipelines to get the best pipeline. And, to get the n_component of an explored pipeline, we first use `.get_nodes()` on the returned cross-validated pipeline and then use .get_estimator() and then finally use `.get_params()`.
114+
As the Codeflare pipelines project is still actively under development, APIs to access some attributes of the explored pipelines in the `grid_search_cv()` are not yet available. As a result, a slightly more verbose code is used to get the best pipeline, its associated parameter values and other statistics from the returned object of `grid_search_cv()`. For example, we need to loop through all the 20 explored pipelines to get the best pipeline. And, to get the `n_component` of an explored pipeline, we first use `.get_nodes()` on the returned cross-validated pipeline and then use `.get_estimator()` and then finally use `.get_params()`.
116115

117116
```python
118117
import statistics

docs/source/getting_started/starting.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,8 @@ export NAMESPACE=<namespace from above>
7171

7272
Update With the following command you can download a basic Ray cluster definition and customize it for your namespace:
7373
```shell
74-
sed "s/NAMESPACE/$NAMESPACE/" > ./example-cluster.yaml
74+
cd ./deploy/ibm_cloud_code_engine/
75+
sed "s/NAMESPACE/$NAMESPACE/" ./example-cluster.yaml.template > ./example-cluster.yaml
7576
```
7677

7778
This reference deployment file will create a Ray cluster with following characteristics:
@@ -163,17 +164,16 @@ pip3 install -r requirements.txt
163164
Assuming openshift cluster access from pre-reqs.
164165

165166
a) Create namespace
166-
167-
```
167+
```shell
168168
$ oc create namespace codefalre
169169
namespace/codeflare created
170170
$
171-
```
172-
171+
```
172+
173173
b) Bring up Ray cluster
174-
175-
```
176-
$ ray up ray/python/ray/autoscaler/kubernetes/example-full.yaml
174+
175+
```
176+
$ ray up ray/python/ray/autoscaler/kubernetes/example-full.yaml
177177
Cluster: default
178178
179179
Checking Kubernetes environment settings
@@ -247,8 +247,8 @@ pip3 install -r requirements.txt
247247
Connect to a terminal on the cluster head:
248248
ray attach /Users/darroyo/git_workspaces/github.com/ray-project/ray/python/ray/autoscaler/kubernetes/example-full.yaml
249249
Get a remote shell to the cluster manually:
250-
kubectl -n ray exec -it ray-head-ql46b -- bash
251-
```
250+
kubectl -n ray exec -it ray-head-ql46b -- bash
251+
```
252252

253253
3. Verify
254254
a) Check for head node
@@ -262,7 +262,7 @@ pip3 install -r requirements.txt
262262
b) Run example test
263263

264264
```
265-
ray submit python/ray/autoscaler/kubernetes/example-full.yaml x.py
265+
ray submit ray/python/ray/autoscaler/kubernetes/example-full.yaml x.py
266266
Loaded cached provider configuration
267267
If you experience issues with the cloud provider, try re-running the command with --no-config-cache.
268268
2021-02-09 08:50:51,028 INFO command_runner.py:171 -- NodeUpdater: ray-head-ql46b: Running kubectl -n ray exec -it ray-head-ql46b -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (python ~/x.py)'
@@ -276,4 +276,4 @@ Jupyter setup demo [Reference repository](https://github.com/erikerlandson/ray-o
276276

277277
### Running examples
278278

279-
Once in a Jupyer envrionment, refer to [notebooks](../../notebooks) for example pipeline. Documentation for reference use cases can be found in [Examples](https://codeflare.readthedocs.io/en/latest/).
279+
Once in a Jupyer envrionment, refer to [notebooks](../../notebooks) for example pipeline. Documentation for reference use cases can be found in [Examples](https://codeflare.readthedocs.io/en/latest/).

0 commit comments

Comments
 (0)