Multi-objective experiments generate duplicated data #2392

HuizhiXu · 2024-04-23T09:18:15Z

Hello,

I have been utilizing the multi-objective optimization approach for my specific use case. Here is a brief overview of the code I have implemented:

model = Models.MOO(
    experiment=self.experiment,
    data=self.experiment.fetch_data(),
    acqf_constructor=get_NEHVI,
)

generator_run = model.gen(1)

The code has been functioning effectively up until now. However, I have recently encountered an error. It appears that the generator may be producing duplicate data points during its operation.

Trial 23 and Trial 24 have same parameters.

Upon investigating potential causes, I found a similar issue here: Completed Multi-objective NAS experiments without metrics and repeated parameter selections The user in that issue also faced a problem with duplicate data generation.

The resolution suggested for their situation involved utilizing the Ax service API and setting the should_deduplicate parameter to True in the call to choose_generation_strategy.

Given this information, I am inquiring whether there is an equivalent parameter available within the Ax developer API that I could use to address the issue of duplicate data generation in my case. If such a parameter exists, I would greatly appreciate guidance on how to implement it in my current code setup.

The text was updated successfully, but these errors were encountered:

bernardbeckerman · 2024-04-23T14:58:50Z

Taking a look!

bernardbeckerman · 2024-04-23T15:03:46Z

@HuizhiXu Thanks for reporting! should_deduplicate is an argument to GenerationStep/GenerationNode and the associated logic is handled here, so you are welcome to look to that as an example. However, we generally recommend switching to the Ax service API whenever possible, since it generally provides a better experience and we are able to provide better support for this API. Are there any reasons in particular that the service API does not fit your needs?

bernardbeckerman · 2024-04-23T15:07:29Z

Also, to address your initial issue, Ax sometimes suggests previously sampled arms if there is observational noise in your metrics, or if Ax is unsure of the degree of observational noise, in order to get a better idea of the true mean and noise at a given point in the search space. If you would like to prevent Ax from re-suggesting points, please see my comment above for some pointers.

HuizhiXu · 2024-04-25T08:04:24Z

@bernardbeckerman thank you for your response and guidance.

The reasons I chose the Developer API over the Service API are as follows:

Your website states that the Developer API supports more customization. Since we want to use different optimization methods for benchmarking, that is why we chose this option.
The Developer API has a Pydantic-style Schema format.

For example, the Service API's parameter constraints are defined as:

parameter_constraints= ["A + B + C + C + E <= 100.0"],

In contrast, the Developer API allows for:

parameter_constraints.append(
    SumConstraint(
        parameters=parameters,
        is_upper_bound=True,
        bound=param_cons.bound,
    )
)

Additionally, there are settings for searchspace and objectiveconfig that I find more aligned with the Pydantic style in the Developer API.

However, overall, I am not aware of the specific advantages that the Developer API has over the Service API.

Furthermore, I would like to know if the following approach to deduplication in my code is correct?

I have defined a arms_by_signature_for_deduplication to store the signature of each arm. After each new trial is generated, I check if the signature of the new arm is already in arms_by_signature_for_deduplication. If it is, I mark it as abandoned.

Here is the code snippet:

def deduplicated_arm(arm):
    if arm.signature in arms_by_signature_for_deduplication:
        return True

def generate(i, model_moo):
    generator_run = model_moo.gen(n=1)
    trial = self.experiment.new_trial(generator_run=generator_run)
    arm = trial.arms[0]
    if deduplicated_arm(arm):
        logger.info("Duplicated arm")
        trial.mark_abandoned(reason="duplication")
        return

    logger.info("Not duplicated arm")
    self.arms_by_signature_for_deduplication[arm.signature] = arm

    ...

bernardbeckerman · 2024-04-25T16:22:24Z

Your website states that the Developer API supports more customization. Since we want to use different optimization methods for benchmarking, that is why we chose this option.

The Developer API has a Pydantic-style Schema format.

These are excellent points! I'll discuss these with the team and see if there's an opportunity to clarify our documentation and/or move toward typed inputs in the service API. Thanks for raising these!

Furthermore, I would like to know if the following approach to deduplication in my code is correct?

This looks good although to match this snippet more closely, it might be better to check for duplication on the generator arms themselves, and retry gen until you either succeed in generating a unique arm or reach some maximum number of tries (draws):

def is_duplicate(arm, arms_by_signature_for_deduplication):
    return arm.signature in arms_by_signature_for_deduplication:

def generate(model_moo):
    arm_is_unique = False
    n_gen_draws = 0
    while not arm_is_unique and n_gen_draws < max_gen_draws_for_deduplication:
        generator_run = model_moo.gen(n=1)
        arm_is_unique = is_duplicate(generator_run.arms[0], self.arms_by_signature_for_deduplication)
        n_gen_draws += 1
    if not arm_is_unique:
        raise RepeatedPointsException("Could not generate unique arm after trying {max_gen_draws_for_deduplication} times.")
    trial = self.experiment.new_trial(generator_run=generator_run)
    arm = trial.arms[0]
    self.arms_by_signature_for_deduplication[arm.signature] = arm

This way you can give the generation strategy a few tries before you give up, and you won't end up with a ton of abandoned trials on your experiment, but rather can handle RepeatedPointsException (or some other exception) elsewhere in the calling code. Hope that helps!

Summary: Update docs to point users toward the Service API ([context](facebook#2392)). Reviewed By: mpolson64 Differential Revision: D56635704

Summary: Update docs to point users toward the Service API ([context](#2392)). Reviewed By: mpolson64 Differential Revision: D56635704 fbshipit-source-id: f68dd0e2248fd662c82d1c5c9f8c712466022240

HuizhiXu · 2024-05-23T02:21:41Z

Hello, the deduplicate function with the Developer API is working well, preventing any duplication from occurring.

We have also taken your advice and switched to the Ax service API lately. Thank you for your assistance.

bernardbeckerman self-assigned this Apr 23, 2024

bernardbeckerman mentioned this issue Apr 26, 2024

Add note that users should favor Service API #2403

Open

bernardbeckerman pushed a commit to bernardbeckerman/Ax that referenced this issue Apr 26, 2024

Add note that users should favor Service API

49704ef

Summary: Update docs to point users toward the Service API ([context](facebook#2392)). Reviewed By: mpolson64 Differential Revision: D56635704

HuizhiXu closed this as completed May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-objective experiments generate duplicated data #2392

Multi-objective experiments generate duplicated data #2392

HuizhiXu commented Apr 23, 2024

bernardbeckerman commented Apr 23, 2024

bernardbeckerman commented Apr 23, 2024

bernardbeckerman commented Apr 23, 2024

HuizhiXu commented Apr 25, 2024

bernardbeckerman commented Apr 25, 2024 •

edited

HuizhiXu commented May 23, 2024

Multi-objective experiments generate duplicated data #2392

Multi-objective experiments generate duplicated data #2392

Comments

HuizhiXu commented Apr 23, 2024

bernardbeckerman commented Apr 23, 2024

bernardbeckerman commented Apr 23, 2024

bernardbeckerman commented Apr 23, 2024

HuizhiXu commented Apr 25, 2024

bernardbeckerman commented Apr 25, 2024 • edited

HuizhiXu commented May 23, 2024

bernardbeckerman commented Apr 25, 2024 •

edited