Skip to content

Commit

Permalink
Remove TensorBoard related code in operator (#391)
Browse files Browse the repository at this point in the history
Ref #347

Blocked until CI is running again.

PS: Dashboard code is not changed.

Signed-off-by: Ce Gao ce.gao@outlook.com
  • Loading branch information
gaocegege authored and jlewi committed Feb 27, 2018
1 parent a814670 commit 0759f7a
Show file tree
Hide file tree
Showing 23 changed files with 12 additions and 928 deletions.
107 changes: 0 additions & 107 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,108 +158,6 @@ spec:
Follow TensorFlow's [instructions](https://www.kubeflow.org/tutorials/using_gpu)
for using GPUs.

### Requesting a TensorBoard instance

You can also ask the `TFJob` operator to create a TensorBoard instance
by including a [TensorBoardSpec](https://github.com/kubeflow/tf-operator/blob/master/pkg/apis/tensorflow/v1alpha1/types.go#L95)
in your job. The table below describes the important fields in
[TensorBoardSpec](https://github.com/kubeflow/tf-operator/blob/master/pkg/apis/tensorflow/v1alpha1/types.go#L95).

| Name | Description | Required | Default |
|---|---|---|---|
| `logDir` | Specifies the directory where TensorBoard will look to find TensorFlow event files that it can display | Yes | `None` |
| `volumes` | `Volumes` information that will be passed to the TensorBoard `deployment` | No | [] |
| `volumeMounts` | `VolumeMounts` information that will be passed to the TensorBoard `deployment` | No | [] |
| `serviceType` | `ServiceType` information that will be passed to the TensorBoard `service`| No | `ClusterIP` |

#### TensorBoard on Azure

On Azure you can store your event files on an Azure Files and use
volumes to make them available to TensorBoard.

```
apiVersion: "kubeflow.org/v1alpha1"
kind: "TFJob"
metadata:
name: "tf-smoke-gpu"
spec:
replica_specs:
- replicas: 1
tfReplicaType: MASTER
template:
spec:
containers:
- image: gcr.io/tf-on-k8s-dogfood/tf_sample_gpu:latest
name: tensorflow
resources:
limits:
alpha.kubernetes.io/nvidia-gpu: 1
restartPolicy: OnFailure
tensorboard:
logDir: /tmp/tensorflow
volumes:
- name: azurefile
azureFile:
secretName: azure-secret
shareName: data
readOnly: false
volumeMounts:
- mountPath: /tmp/tensorflow
name: azurefile
```

#### TensorBoard on GKE

On GKE you can store your event files on GCS and TensorBoard/TensorFlow
can read/write directly to GCS.

```
apiVersion: "kubeflow.org/v1alpha1"
kind: "TFJob"
metadata:
name: "tf-smoke-gpu"
spec:
replica_specs:
- replicas: 1
tfPort: 2222
tfReplicaType: MASTER
template:
spec:
containers:
- image: gcr.io/tf-on-k8s-dogfood/tf_sample_gpu:latest
name: tensorflow
args:
- --log_dir=gs://my-bucket/logdir
resources:
limits:
alpha.kubernetes.io/nvidia-gpu: 1
restartPolicy: OnFailure
tensorboard:
logDir: gs://my-bucket/logdir
```

#### Connecting to TensorBoard

The TFJob operator will create a service named
**tensorboard-$RUNTIME_ID** for your job. You can connect to it
using the Kubernetes API Server proxy as follows

Start the K8s proxy
```
kubectl proxy
```

In a web-browser open up

```
http://${PROXY}:8001/api/v1/proxy/namespaces/default/services/tensorboard-${RUNTIME_ID}:80/
```

Depending on how you configure the service for TensorBoard and cluster
you can make TensorBoard available without using the K8s proxy.

## Monitoring your job

To get the status of your job
Expand Down Expand Up @@ -324,11 +222,6 @@ spec:
restartPolicy: OnFailure
tfPort: 2222
tfReplicaType: PS
tensorboard:
logDir: /tmp/tensorflow
serviceType: ""
volumeMounts: null
volumes: null
tfImage: tensorflow/tensorflow:1.3.0
status:
conditions: null
Expand Down
27 changes: 3 additions & 24 deletions dashboard/backend/handler/api_handler.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,10 @@ type APIHandler struct {
}

// TFJobDetail describe the specification of a TFJob
// as well as related TensorBoard service if any and related pods
// if any and related pods
type TFJobDetail struct {
TFJob *v1alpha1.TFJob `json:"tfJob"`
TbService *v1.Service `json:"tbService"`
Pods []v1.Pod `json:"pods"`
TFJob *v1alpha1.TFJob `json:"tfJob"`
Pods []v1.Pod `json:"pods"`
}

// TFJobList is a list of TFJobs
Expand Down Expand Up @@ -140,26 +139,6 @@ func (apiHandler *APIHandler) handleGetTFJobDetail(request *restful.Request, res
TFJob: job,
}

if job.Spec.TensorBoard != nil {
tbSpec, err := apiHandler.cManager.ClientSet.CoreV1().Services(namespace).List(metav1.ListOptions{
LabelSelector: fmt.Sprintf("kubeflow.org=,app=tensorboard,runtime_id=%s", job.Spec.RuntimeId),
})
if err != nil {
log.Warningf("failed to list TensorBoard for TFJob %v under namespace %v, error: %v", job.Name, job.Namespace, err)
// TODO maybe partial result?
response.WriteError(http.StatusNotFound, err)
return
} else if len(tbSpec.Items) > 0 {
// Should never be more than 1 service that matched, handle error
// Handle case where no TensorBoard is found
tfJobDetail.TbService = &tbSpec.Items[0]
log.Warningf("more than one TensorBoards found for TFJob %v under namespace %v, this should be impossible",
job.Name, job.Namespace)
} else {
log.Warningf("Couldn't find a TensorBoard service for TFJob %v under namespace %v", job.Name, job.Namespace)
}
}

// Get associated pods
pods, err := apiHandler.cManager.ClientSet.CoreV1().Pods(namespace).List(metav1.ListOptions{
LabelSelector: fmt.Sprintf("kubeflow.org=,runtime_id=%s", job.Spec.RuntimeId),
Expand Down
26 changes: 0 additions & 26 deletions dashboard/frontend/src/components/CreateJob.js
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ import RaisedButton from "material-ui/RaisedButton";
import { withRouter } from "react-router-dom";

import { createTFJobService } from "../services";
import CreateTensorBoard from "./CreateTensorBoard";
import RequiredTextField from "./RequiredTextField";
import VolumeCreator from "./VolumeCreator";
import EnvVarCreator from "./EnvVarCreator";
Expand All @@ -33,8 +32,6 @@ class CreateJob extends Component {
psImage: "",
psCommand: "",
psArgs: "",
tbIsPresent: false,
tbSpec: {},
masterVolumeSpec: {},
workerVolumeSpec: {},
psVolumeSpec: {},
Expand All @@ -44,7 +41,6 @@ class CreateJob extends Component {
};

this.handleInputChange = this.handleInputChange.bind(this);
this.setTensorboardSpec = this.setTensorboardSpec.bind(this);
this.cancel = this.cancel.bind(this);
this.deploy = this.deploy.bind(this);
this.setMasterVolumesSpec = this.setMasterVolumesSpec.bind(this);
Expand All @@ -55,10 +51,6 @@ class CreateJob extends Component {
this.setPSEnvVars = this.setPSEnvVars.bind(this);
}

setTensorboardSpec(tbSpec) {
this.setState({ tbSpec });
}

handleInputChange(event) {
const target = event.target;
const value = target.type === "checkbox" ? target.checked : target.value;
Expand Down Expand Up @@ -230,20 +222,6 @@ class CreateJob extends Component {
<VolumeCreator setVolumesSpec={this.setPSVolumesSpec} />
</div>
)}

{/* TENSORBOARD */}
<Divider style={this.styles.divider} />
<h3 style={this.styles.header}>TensorBoard</h3>
<Toggle
label="Enabled"
defaultToggled={false}
name="tbIsPresent"
onToggle={this.handleInputChange}
style={this.styles.toggle}
/>
{this.state.tbIsPresent && (
<CreateTensorBoard setTensorBoardSpec={this.setTensorboardSpec} />
)}
</CardText>
<CardActions>
<RaisedButton label="Deploy" primary={true} onClick={this.deploy} />
Expand Down Expand Up @@ -302,10 +280,6 @@ class CreateJob extends Component {
}
};

if (this.state.tbIsPresent) {
spec.spec.tensorboard = this.state.tbSpec;
}

createTFJobService(spec)
.then(() => this.props.history.push("/"))
.catch(console.error);
Expand Down
86 changes: 0 additions & 86 deletions dashboard/frontend/src/components/CreateTensorBoard.js

This file was deleted.

4 changes: 0 additions & 4 deletions dashboard/frontend/src/components/Job.js
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
import React, { Component } from "react";
import JobDetail from "./JobDetail.js";
import ReplicaSpec from "./ReplicaSpec.js";
import TensorBoard from "./TensorBoard.js";
import { Card, CardText } from "material-ui/Card";
import { getTFJobService } from "../services";

Expand Down Expand Up @@ -35,9 +34,6 @@ class Job extends Component {
<div style={this.divStyle}>
<JobDetail tfjob={job} />
</div>
<div style={this.divStyle}>
<TensorBoard service={this.state.tbService} />
</div>
{replicaSpecs}
</div>
);
Expand Down
41 changes: 0 additions & 41 deletions dashboard/frontend/src/components/TensorBoard.js

This file was deleted.

3 changes: 0 additions & 3 deletions developer_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,6 @@ $ tree -d -I 'vendor|bin|.git'
├── docs
│   └── diagrams
├── examples
│   ├── charts
│   │   └── tensorboard
│   │   └── templates
│   ├── crd
│   ├── gke
│   │   └── notebook_image
Expand Down

0 comments on commit 0759f7a

Please sign in to comment.