Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating A Second Comet Logger Disables The First #19900

Closed
EtayLivne opened this issue May 23, 2024 · 0 comments · Fixed by #19915
Closed

Creating A Second Comet Logger Disables The First #19900

EtayLivne opened this issue May 23, 2024 · 0 comments · Fixed by #19915
Labels
bug Something isn't working needs triage Waiting to be triaged by maintainers ver: 2.1.x

Comments

@EtayLivne
Copy link
Contributor

EtayLivne commented May 23, 2024

Bug description

If two instances of a CometLogger are created (for example, in order to obtain artifacts from an existing Comet experiment before starting a new one), the first will throw an exception on any Comet API action attempted after the second one is created.

What version are you seeing the problem on?

v2.1, v2.2

How to reproduce the bug

"""
First make sure you have credentials on the comet instance you're going to use, most likely by setting the COMET_API_KEY env variable. 
"""


from lightning.pytorch.loggers.comet import CometLogger
from comet_ml import API, Artifact, ExistingExperiment

from os import environ

def get_existing_experiment(
    workspace,
    project_name,
    experiment
):
    api_experiment = API().get_experiment(
                workspace=workspace, #"etayl",
                project_name=project_name, #"angie-tutorial",
                experiment=experiment #"spatial_sturgeon_1690",
            )
    exp_obj = ExistingExperiment(experiment_key=api_experiment.key)
    return exp_obj, api_experiment.key


def get_new_experiment(
    workspace,
    project_name,
    experiment_name
):
    api_experiment = \
              API()._create_experiment(
                    workspace=workspace,
                    project_name=project_name,
                    experiment_name=experiment_name,
                )
    exp_obj = ExistingExperiment(experiment_key=api_experiment.key)
    return exp_obj, api_experiment.key


def add_artifact_to_experiment(existing_experiment: ExistingExperiment, artifact_name: str):
    new_artifact = Artifact(artifact_name)   # "artifact-file.txt"
    with open("artifact-file.txt", "w") as handler:
        handler.write("file content")
        
    new_artifact.add("artifact-file.txt")
    existing_experiment.log_artifact(new_artifact)

def init_comet_logger(
    workspace, #"etayl",
    project_name, #"angie-tutorial",
    experiment_name, #"spatial_sturgeon_1690",
    experiment_key
):

    comet_logger = CometLogger(
                workspace=workspace, #"etayl",
                project_name=project_name, #"angie-tutorial",
                experiment_name=experiment_name, #"spatial_sturgeon_1690",
                experiment_key=experiment_key, #experiment_key,
                offline=False,
                save_dir="/homes/etayl/code/menta3/temp_save_dir",
                auto_output_logging="native",
                auto_metric_logging=False,
                log_env_details=True,
                log_env_gpu=True,
                log_env_cpu=True,
                log_env_host=True
            )
    return comet_logger

def fail_when_two_loggers_live_at_the_same_time():
    workspace = "etayl"
    project_name = "angie-tutorial"
    experiment_name = "spatial_sturgeon_1690"
    artifact_name = "etayrtifact"
    
    experiment_obj, experiment_key = get_existing_experiment(
        workspace,
        project_name,
        experiment_name
    )
    
    first_comet_logger = init_comet_logger(workspace, project_name, experiment_name, experiment_key)
    add_artifact_to_experiment(experiment_obj, artifact_name)

    new_experiment, new_experiment_key = get_new_experiment(
        workspace,
        project_name,
        "new_" + experiment_name
    )
    first_comet_logger.experiment.get_artifact(artifact_name)
    print("\n ### that succeeds ### \n")
    
    second_comet_logger = init_comet_logger(workspace, project_name, new_experiment.name, new_experiment_key)
    
    first_comet_logger.experiment.get_artifact(artifact_name)
    print("\n ### this too ### \n")
    
    add_artifact_to_experiment(second_comet_logger.experiment, artifact_name)
    try:
        first_comet_logger.experiment.get_artifact(artifact_name)
    except:
        print("\n ### but this fails! ### \n")

fail_when_two_loggers_live_at_the_same_time()

Error messages and logs

Traceback (most recent call last):
  File "/code_path/get_artifact_from_dead_experiment.py", line 146, in <module>
    fail_when_two_loggers_live_at_the_same_time()
  File "/code_path/get_artifact_from_dead_experiment.py", line 134, in fail_when_two_loggers_live_at_the_same_time
    first_comet_logger.experiment.get_artifact(artifact_name)
  File "/made_up_env_path/lib/python3.11/site-packages/comet_ml/_online.py", line 1097, in get_artifact
    raise ExperimentNotAlive(
comet_ml.exceptions.ExperimentNotAlive: Experiment <comet_ml._online.ExistingExperiment object at 0x7fffd23da910> is not alive, cannot get artifact

Environment

Current environment
#- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow):  Logger (specifically, CometLogger)
#- PyTorch Lightning Version (e.g., 1.5.0): 2.2 (and 2.1)
#- Lightning App Version (e.g., 0.5.2): 2.2 (and 2.1)
#- PyTorch Version (e.g., 2.0): 2.2.1
#- Python version (e.g., 3.9): 3.11
#- OS (e.g., Linux): Linux
#- How you installed Lightning(`conda`, `pip`, source): Conda (Mamba)
#- Running environment of LightningApp (e.g. local, cloud): local

More info

Theory for source of the bug

  1. the Pytorch Lightning CometLogger class has an ._experiment attribute that holds either an Experiment object or an ExistingExperiment object. To reset the active experiment the logger obj is talking to, you can set ._experiment to None (or call the .finalize() utility which does that too).

  2. the CometLogger has a "public" @Attribute method .experiment, which checks first if ._experiment is None, and if so -creates a new Experimet/ExistingExperiment obj behind the scenes.

  3. However, each Experimet/ExistingExperiment also has a boolean "alive" flag, and will only work if it is set to True. The flag can manually be set to false by calling the .end() method, but it also appears that Comet only supports having one alive experiment at a time, so when one Experiment is being interacted with - for example, by logging an artifact to it, the .alive flag of other experiments is set to False

Suggested fix

The source of the problem seems to be that the Lightning logger attempts to maintain a live experiment by toggling an experiment to None once it is done with it - and this is a different mechanism than the one built-in to Comet, which works with a .alive attribute on each instance of an Api object.

Resolving the current and potential future issues can be achieved by using the built-in mechanism instead of the current seperate one. As long as there are two different mechanisms, there is a potential for them becoming out of sync with each other.

@EtayLivne EtayLivne added bug Something isn't working needs triage Waiting to be triaged by maintainers labels May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Waiting to be triaged by maintainers ver: 2.1.x
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant