Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-objective experiments generate duplicated data #2392

Closed
HuizhiXu opened this issue Apr 23, 2024 · 6 comments
Closed

Multi-objective experiments generate duplicated data #2392

HuizhiXu opened this issue Apr 23, 2024 · 6 comments
Assignees

Comments

@HuizhiXu
Copy link

Hello,

I have been utilizing the multi-objective optimization approach for my specific use case. Here is a brief overview of the code I have implemented:

model = Models.MOO(
    experiment=self.experiment,
    data=self.experiment.fetch_data(),
    acqf_constructor=get_NEHVI,
)

generator_run = model.gen(1)

The code has been functioning effectively up until now. However, I have recently encountered an error. It appears that the generator may be producing duplicate data points during its operation.

截屏2024-04-23 17 11 03

Trial 23 and Trial 24 have same parameters.

Upon investigating potential causes, I found a similar issue here: Completed Multi-objective NAS experiments without metrics and repeated parameter selections The user in that issue also faced a problem with duplicate data generation.

The resolution suggested for their situation involved utilizing the Ax service API and setting the should_deduplicate parameter to True in the call to choose_generation_strategy.

Given this information, I am inquiring whether there is an equivalent parameter available within the Ax developer API that I could use to address the issue of duplicate data generation in my case. If such a parameter exists, I would greatly appreciate guidance on how to implement it in my current code setup.

@bernardbeckerman bernardbeckerman self-assigned this Apr 23, 2024
@bernardbeckerman
Copy link
Contributor

Taking a look!

@bernardbeckerman
Copy link
Contributor

@HuizhiXu Thanks for reporting! should_deduplicate is an argument to GenerationStep/GenerationNode and the associated logic is handled here, so you are welcome to look to that as an example. However, we generally recommend switching to the Ax service API whenever possible, since it generally provides a better experience and we are able to provide better support for this API. Are there any reasons in particular that the service API does not fit your needs?

@bernardbeckerman
Copy link
Contributor

Also, to address your initial issue, Ax sometimes suggests previously sampled arms if there is observational noise in your metrics, or if Ax is unsure of the degree of observational noise, in order to get a better idea of the true mean and noise at a given point in the search space. If you would like to prevent Ax from re-suggesting points, please see my comment above for some pointers.

@HuizhiXu
Copy link
Author

@bernardbeckerman thank you for your response and guidance.

The reasons I chose the Developer API over the Service API are as follows:

  • Your website states that the Developer API supports more customization. Since we want to use different optimization methods for benchmarking, that is why we chose this option.
  • The Developer API has a Pydantic-style Schema format.

For example, the Service API's parameter constraints are defined as:

parameter_constraints= ["A + B + C + C + E <= 100.0"],  

In contrast, the Developer API allows for:

parameter_constraints.append(
    SumConstraint(
        parameters=parameters,
        is_upper_bound=True,
        bound=param_cons.bound,
    )
)

Additionally, there are settings for searchspace and objectiveconfig that I find more aligned with the Pydantic style in the Developer API.

However, overall, I am not aware of the specific advantages that the Developer API has over the Service API.

Furthermore, I would like to know if the following approach to deduplication in my code is correct?

I have defined a arms_by_signature_for_deduplication to store the signature of each arm. After each new trial is generated, I check if the signature of the new arm is already in arms_by_signature_for_deduplication. If it is, I mark it as abandoned.

Here is the code snippet:

def deduplicated_arm(arm):
    if arm.signature in arms_by_signature_for_deduplication:
        return True

def generate(i, model_moo):
    generator_run = model_moo.gen(n=1)
    trial = self.experiment.new_trial(generator_run=generator_run)
    arm = trial.arms[0]
    if deduplicated_arm(arm):
        logger.info("Duplicated arm")
        trial.mark_abandoned(reason="duplication")
        return

    logger.info("Not duplicated arm")
    self.arms_by_signature_for_deduplication[arm.signature] = arm

    ...

@bernardbeckerman
Copy link
Contributor

bernardbeckerman commented Apr 25, 2024

  • Your website states that the Developer API supports more customization. Since we want to use different optimization methods for benchmarking, that is why we chose this option.
  • The Developer API has a Pydantic-style Schema format.

These are excellent points! I'll discuss these with the team and see if there's an opportunity to clarify our documentation and/or move toward typed inputs in the service API. Thanks for raising these!

Furthermore, I would like to know if the following approach to deduplication in my code is correct?

This looks good although to match this snippet more closely, it might be better to check for duplication on the generator arms themselves, and retry gen until you either succeed in generating a unique arm or reach some maximum number of tries (draws):

def is_duplicate(arm, arms_by_signature_for_deduplication):
    return arm.signature in arms_by_signature_for_deduplication:

def generate(model_moo):
    arm_is_unique = False
    n_gen_draws = 0
    while not arm_is_unique and n_gen_draws < max_gen_draws_for_deduplication:
        generator_run = model_moo.gen(n=1)
        arm_is_unique = is_duplicate(generator_run.arms[0], self.arms_by_signature_for_deduplication)
        n_gen_draws += 1
    if not arm_is_unique:
        raise RepeatedPointsException("Could not generate unique arm after trying {max_gen_draws_for_deduplication} times.")
    trial = self.experiment.new_trial(generator_run=generator_run)
    arm = trial.arms[0]
    self.arms_by_signature_for_deduplication[arm.signature] = arm

This way you can give the generation strategy a few tries before you give up, and you won't end up with a ton of abandoned trials on your experiment, but rather can handle RepeatedPointsException (or some other exception) elsewhere in the calling code. Hope that helps!

bernardbeckerman pushed a commit to bernardbeckerman/Ax that referenced this issue Apr 26, 2024
Summary: Update docs to point users toward the Service API ([context](facebook#2392)).

Reviewed By: mpolson64

Differential Revision: D56635704
facebook-github-bot pushed a commit that referenced this issue Apr 26, 2024
Summary: Update docs to point users toward the Service API ([context](#2392)).

Reviewed By: mpolson64

Differential Revision: D56635704

fbshipit-source-id: f68dd0e2248fd662c82d1c5c9f8c712466022240
@HuizhiXu
Copy link
Author

Hello, the deduplicate function with the Developer API is working well, preventing any duplication from occurring.

We have also taken your advice and switched to the Ax service API lately. Thank you for your assistance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants