PyGlove interface errors with google-vizier 0.1.13 #1044

gelatinouscube42 · 2024-01-23T17:04:54Z

There was an error in backend.py causing the library imports to fail, some extra text that needed to be deleted.

Then in core.py, there is an attempted import from vizier.google of "metadata_to_user" which breaks the import of that file.

Looking at this repo, it seems these were already addressed at some point, but perhaps not pushed to the pip repository.

sagipe · 2024-01-23T17:16:49Z

Apologies, it should work fine at HEAD.
I'm sending a PR to release version 0.1.14 to pypi now.
When it's released, please upgrade to 0.1.14.
Best,
Sagi

gelatinouscube42 · 2024-01-23T18:44:24Z

Thanks for your quick response.

For the record, I pulled your latest commit, and am now having an issue importing vizier_server from vizier.service. Trying to run this down now.

Trying to follow the example in the docs for using Vizier as a backend for PyGlove, for the record. A more complete example would be helpful, for what its worth. E.g., its not clear from that section how pg_vizier.init("my_study") creates an object that can be used to query for the optimal trials, as per the examples in other sections.

sagipe · 2024-01-23T19:17:51Z

Our unit tests and Colab notebooks run fine with version 0.1.14.

If you're having an issue importing vizier_server, and you are doing this in Colab, perhaps try to restart the Colab runtime first?
What error are you getting?

RE the pyglove question, see an example of querying the result in
https://github.com/google/vizier/blob/main/vizier/_src/pyglove/e2e_test.py#L42
which uses this Result object:
https://github.com/google/vizier/blob/main/vizier/_src/pyglove/core.py#L350

result = pg.poll_result('')
result.trials
result.best_trials

We can add an example to the docs as well.

Best,
Sagi

xingyousong · 2024-01-23T19:47:12Z

(FYI) The tutorial for running PyGlove with OSS Vizier is here: https://oss-vizier.readthedocs.io/en/latest/advanced_topics/pyglove/vizier_as_backend.html

Is there something missing / not working about it?

gelatinouscube42 · 2024-01-23T20:40:49Z

@xingyousong

I will provide more detail as I attempt to run through the example on my machine. With respect to the PyGlove example specifically, it was not clear to me at all what the line

pg_vizier.init("my_study")

is doing. It seems to be using a different interface than the Vizier basics examples, which had you initialize a server and a client separately, and then use the client to query the database for the results.

A slightly separate issue, but still relevant, is that there does not appear to be a reference anywhere in the examples of how the datastore is initialized, and/or how we might configure to interface with a pre-existing database. I'm planning on running my tuning experiments when the machines on my network are otherwise latent, and would need/want the database to persist. Probably the answer will be obvious once I find the relevant code in the repo, but doubts/issues such as this are slowing me down.

gelatinouscube42 · 2024-01-23T20:46:00Z

Related to the database concern, I thought I had it running on my machine as I had a run without error, but tried another run, and am getting the error

"Failed to find study name: my_study.basic_run"

Presumably that should have been created somewhere behind the scenes, but not clear where.

edit:
when I run the pyglove tests via

bash run_tests.sh pyglove

I get two failures consistently, both of which seem to be pointing to a study not being found. The failed tests are
performance_test.py::PerformanceTest::test_multiple_workers0
and
oss_vizier_test.py::OSSVizierSampleTest::testSamplingWithMultiObjectiveAlgorithm

I am also intermittently getting an error "Cannot start already-started server!" which appears to be attempting to initialize a new Pythia process. I'm not getting these errors when running the Vizier Basics example.

Edit:
Hypothesis for what is going on: the error messages are coming from the call to _setup_study in backend.py; they appear to be intentional, as the try/except seems to be serving as de facto control flow logic, hinging on whether or not a study with a particular name has already been created.

On my machine, practically all of the threads hit the database up for the study to have been created before it was by some other process, causing an error.

By the time this error is handled, it appears as if the RPC's are killed either by a timeout or the simple fact that an error had occurred. I get the error:

<_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNKNOWN
	details = "Exception calling application: 'Failed to find study name: owners/<username>/studies/my_study.worker_run'"
	debug_error_string = "UNKNOWN:Error received from peer  {created_time:"2024-01-24T11:58:37.20558528-05:00", grpc_status:2, grpc_message:"Exception calling application: \'Failed to find study name: owners/<username>/studies/my_study.worker_run\'"}"

This reads to me as though the RPC terminates when that exception is first encountered, but sometimes my process crashses with:

grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.UNKNOWN
        details = "Exception calling application: 'Failed to find trial name: owners/<username>/studies/my_study.worker_run/trials/10'"
        debug_error_string = "UNKNOWN:Error received from peer  {grpc_message:"Exception calling application: \'Failed to find trial name: owners/<username>/studies/my_study.worker_run/trials/10\'", grpc_status:2, created_time:"2024-01-24T12:07:09.54003539-05:00"}"

This leads me to believe that sometimes the service is able to generate/start the study and actually gets to attempting to run trials before crashing, which points more to something like a socket timeout. Not sure.

gelatinouscube42 · 2024-01-24T14:42:48Z

@sagipe

With regards to to importing vizier_server, it works if I change the input to vizier._src.service, but still does not work if I try to import from vizier.service

Error:
ImportError: cannot import name 'vizier_server' from 'vizier.service' (/vizier/vizier/service/init.py)

I did pull the recent changes and re-install before trying this, fwiw.

Edit:
For the record, trying to explicitly initialize the service, since it seems the only I can find to connect to a pre-existing database; even with this import and explicit server initialization, the "Failed to find study name" errors are occurring, which seem to trace back to something regarding interaction with the datastore...

xingyousong · 2024-01-24T18:19:03Z

There's a few facts that might help this thread overall:

Our installation requires building protos: https://github.com/google/vizier/blob/main/setup.py#L55, so it's worth making the setup really worked. Since we've uploaded 0.1.14, try removing everything related to Vizier from your computer and doing a completely fresh re-install pip install google-vizier==0.1.14.
The SQL Database file path is here: https://github.com/google/vizier/blob/main/vizier/_src/service/constants.py#L41. Basically the vizier.db file is stored inside your /vizier/_src/service/... folder by default. You can optionally change this database URL when starting a fresh server: https://github.com/google/vizier/blob/main/vizier/_src/service/vizier_server.py#L51
The pg_vizier.init(...) is defined here: https://github.com/google/vizier/blob/main/vizier/_src/pyglove/oss_vizier.py#L264. It's essentially a PyGlove-version of a regular Vizier Client, where the server address needs to be specified.

gelatinouscube42 · 2024-01-24T19:04:50Z

@xingyousong

Did as you suggested, still have the same errors.

Btw, I had run the script to compile the protos after I had pulled from the repo and re-installed the local copy, so I think that part should have been fine.

xingyousong · 2024-01-24T19:41:20Z

@gelatinouscube42 can you send a code snippet to reproduce this issue?

gelatinouscube42 · 2024-01-24T19:45:50Z

Sure, see below. It is more or less verbatim taken from the example in the docs...

import multiprocessing
import multiprocessing.pool
import os

import pyglove as pg
from vizier import pyglove as pg_vizier
from vizier._src.service import vizier_server

search_space = pg.Dict(x=pg.floatv(0.0, 1.0), y=pg.floatv(0.0, 1.0))
algorithm = pg.evolution.regularized_evolution()
num_trials = 100


search_space = pg.Dict(x=pg.floatv(0.0, 1.0), y=pg.floatv(0.0,1.0))
algorithm = pg.evolution.regularized_evolution()
num_trials = 100

def evaluator(value: pg.Dict):
    return value.x**2 - value.y**2

server = vizier_server.DefaultVizierServer()
pg_vizier.init("my_study", vizier_endpoint=server.endpoint)

num_workers = 10

def work_fun(worker_id):
    print(f"Worker ID: {worker_id}")
    for value, feedback in pg.sample(
        search_space,
        algorithm=algorithm,
        num_examples=num_trials // num_workers,
        name='worker_run',
        ):
        reward = evaluator(value)
        feedback(reward=reward)

with multiprocessing.pool.ThreadPool(num_workers) as pool:
    pool.map(work_fun, range(num_workers))

PiperOrigin-RevId: 602940226

xingyousong closed this as completed Jan 23, 2024

xingyousong reopened this Jan 23, 2024

copybara-service bot pushed a commit that referenced this issue Jan 31, 2024

Fix endpoint issues in pyglove colab. Should resolve #1044

f1d757f

PiperOrigin-RevId: 602940226

copybara-service bot mentioned this issue Jan 31, 2024

Fix endpoint issues in pyglove colab. Should resolve https://github.com/google/vizier/issues/1044 #1047

Merged

copybara-service bot pushed a commit that referenced this issue Jan 31, 2024

Fix endpoint issues in pyglove colab. Should resolve #1044

226e467

PiperOrigin-RevId: 602940226

copybara-service bot pushed a commit that referenced this issue Jan 31, 2024

Fix endpoint issues in pyglove colab. Should resolve #1044

b0e7cd1

PiperOrigin-RevId: 602940226

copybara-service bot pushed a commit that referenced this issue Jan 31, 2024

Fix endpoint issues in pyglove colab. Should resolve #1044

aa61900

PiperOrigin-RevId: 602940226

copybara-service bot closed this as completed in a15e773 Jan 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyGlove interface errors with google-vizier 0.1.13 #1044

PyGlove interface errors with google-vizier 0.1.13 #1044

gelatinouscube42 commented Jan 23, 2024

sagipe commented Jan 23, 2024

gelatinouscube42 commented Jan 23, 2024

sagipe commented Jan 23, 2024

xingyousong commented Jan 23, 2024

gelatinouscube42 commented Jan 23, 2024

gelatinouscube42 commented Jan 23, 2024 •

edited

gelatinouscube42 commented Jan 24, 2024 •

edited

xingyousong commented Jan 24, 2024 •

edited

gelatinouscube42 commented Jan 24, 2024

xingyousong commented Jan 24, 2024

gelatinouscube42 commented Jan 24, 2024

PyGlove interface errors with google-vizier 0.1.13 #1044

PyGlove interface errors with google-vizier 0.1.13 #1044

Comments

gelatinouscube42 commented Jan 23, 2024

sagipe commented Jan 23, 2024

gelatinouscube42 commented Jan 23, 2024

sagipe commented Jan 23, 2024

xingyousong commented Jan 23, 2024

gelatinouscube42 commented Jan 23, 2024

gelatinouscube42 commented Jan 23, 2024 • edited

gelatinouscube42 commented Jan 24, 2024 • edited

xingyousong commented Jan 24, 2024 • edited

gelatinouscube42 commented Jan 24, 2024

xingyousong commented Jan 24, 2024

gelatinouscube42 commented Jan 24, 2024

gelatinouscube42 commented Jan 23, 2024 •

edited

gelatinouscube42 commented Jan 24, 2024 •

edited

xingyousong commented Jan 24, 2024 •

edited