Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify Katib is working #973

Closed
jlewi opened this issue Jun 11, 2018 · 7 comments
Closed

Verify Katib is working #973

jlewi opened this issue Jun 11, 2018 · 7 comments
Assignees

Comments

@jlewi
Copy link
Contributor

jlewi commented Jun 11, 2018

We'd like to verify that Katib is minimally working in particular

  • UI is accessible
  • Can run demo jobs
@lluunn
Copy link
Contributor

lluunn commented Jun 12, 2018

ModelDB UI can be accessed by port-forwarding

kubectl -n kubeflow port-forward modeldb-frontend-58b49f7d6-cxjdv 3000:3000

On http://localhost:3000/katib/projects

screenshot from 2018-06-12 12-32-50

@lluunn
Copy link
Contributor

lluunn commented Jun 12, 2018

Running demo script

go run git-issue-summarize-demo.go 

Can see something from UI now
http://localhost:3000/katib/projects/1/models
screenshot from 2018-06-12 14-01-19

However, seeing many errors like:

2018/06/12 13:57:46 WorkerID n2216617d688b5d6 start
2018/06/12 13:57:46 WorkerID h69866430461badd start
2018/06/12 13:57:56 GetMetErr rpc error: code = Unknown desc = the server rejected our request for an unknown reason (get pods h69866430461badd-d4lvj)
2018/06/12 13:58:06 GetMetErr rpc error: code = Unknown desc = the server rejected our request for an unknown reason (get pods h69866430461badd-d4lvj)

@lluunn
Copy link
Contributor

lluunn commented Jun 12, 2018

After many getMetErr, the script finished with

2018/06/12 14:02:08 GetMetErr rpc error: code = Unknown desc = the server rejected our request for an unknown reason (get pods h69866430461badd-d4lvj)
2018/06/12 14:06:48 All Worker Completed!

But I don't see any models in ModelDB UI.

cc @YujiOshima any idea about the rpc error? Should I see some models after running this script?

@lluunn
Copy link
Contributor

lluunn commented Jun 12, 2018

Tried again, this time no rpc error..

2018/06/12 15:29:27 Study ID x1b23b1e66e7304e
2018/06/12 15:29:27 Study ID x1b23b1e66e7304e StudyConfname:"grid-demo" owner:"katib" optimization_type:MAXIMIZE optimization_goal:0.99 parameter_configs:<configs:<name:"--learning_rate" parameter_type:DOUBLE feasible:<max:"0.5" min:"0.005" > > > default_suggestion_algorithm:"grid" default_early_stopping_algorithm:"medianstopping" objective_value_name:"Validation-accuracy" metrics:"accuracy" metrics:"Validation-accuracy"
2018/06/12 15:29:27 Grid Prameter ID h98395ad40bae856
2018/06/12 15:29:27 Get Grid Suggestions:
2018/06/12 15:29:27 trial_id:"nd79c7199b993c0d" study_id:"x1b23b1e66e7304e" parameter_set:<name:"--learning_rate" parameter_type:DOUBLE value:"0.0050" >
2018/06/12 15:29:27 trial_id:"y90dfd19fac3e85d" study_id:"x1b23b1e66e7304e" parameter_set:<name:"--learning_rate" parameter_type:DOUBLE value:"0.5000" >
2018/06/12 15:29:27 WorkerID ve0b20c693622501 start
2018/06/12 15:29:28 WorkerID gc3c96f5b3838b5c start
2018/06/12 15:33:08 All Worker Completed!

But still no models in UI

@YujiOshima
Copy link
Contributor

@lluunn It looks a problem of training script.
The traing script may not print results. I copied from here

Please try client-test.
If you want to use GPU, please edit worker config

@lluunn
Copy link
Contributor

lluunn commented Jun 13, 2018

Thank you @YujiOshima !
I was able to run go run client-example.go and see models in UI:

screenshot from 2018-06-13 10-54-48

screenshot from 2018-06-13 10-55-28

@lluunn lluunn mentioned this issue Jun 13, 2018
@lluunn
Copy link
Contributor

lluunn commented Jun 13, 2018

It's (barely) working now. Filed #991 , closing this one

@lluunn lluunn closed this as completed Jun 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants