Skip to content

Creating A Second Comet Logger Disables The First  #19900

Closed
@EtayLivne

Description

@EtayLivne

Bug description

If two instances of a CometLogger are created (for example, in order to obtain artifacts from an existing Comet experiment before starting a new one), the first will throw an exception on any Comet API action attempted after the second one is created.

What version are you seeing the problem on?

v2.1, v2.2

How to reproduce the bug

"""
First make sure you have credentials on the comet instance you're going to use, most likely by setting the COMET_API_KEY env variable. 
"""


from lightning.pytorch.loggers.comet import CometLogger
from comet_ml import API, Artifact, ExistingExperiment

from os import environ

def get_existing_experiment(
    workspace,
    project_name,
    experiment
):
    api_experiment = API().get_experiment(
                workspace=workspace, #"etayl",
                project_name=project_name, #"angie-tutorial",
                experiment=experiment #"spatial_sturgeon_1690",
            )
    exp_obj = ExistingExperiment(experiment_key=api_experiment.key)
    return exp_obj, api_experiment.key


def get_new_experiment(
    workspace,
    project_name,
    experiment_name
):
    api_experiment = \
              API()._create_experiment(
                    workspace=workspace,
                    project_name=project_name,
                    experiment_name=experiment_name,
                )
    exp_obj = ExistingExperiment(experiment_key=api_experiment.key)
    return exp_obj, api_experiment.key


def add_artifact_to_experiment(existing_experiment: ExistingExperiment, artifact_name: str):
    new_artifact = Artifact(artifact_name)   # "artifact-file.txt"
    with open("artifact-file.txt", "w") as handler:
        handler.write("file content")
        
    new_artifact.add("artifact-file.txt")
    existing_experiment.log_artifact(new_artifact)

def init_comet_logger(
    workspace, #"etayl",
    project_name, #"angie-tutorial",
    experiment_name, #"spatial_sturgeon_1690",
    experiment_key
):

    comet_logger = CometLogger(
                workspace=workspace, #"etayl",
                project_name=project_name, #"angie-tutorial",
                experiment_name=experiment_name, #"spatial_sturgeon_1690",
                experiment_key=experiment_key, #experiment_key,
                offline=False,
                save_dir="/homes/etayl/code/menta3/temp_save_dir",
                auto_output_logging="native",
                auto_metric_logging=False,
                log_env_details=True,
                log_env_gpu=True,
                log_env_cpu=True,
                log_env_host=True
            )
    return comet_logger

def fail_when_two_loggers_live_at_the_same_time():
    workspace = "etayl"
    project_name = "angie-tutorial"
    experiment_name = "spatial_sturgeon_1690"
    artifact_name = "etayrtifact"
    
    experiment_obj, experiment_key = get_existing_experiment(
        workspace,
        project_name,
        experiment_name
    )
    
    first_comet_logger = init_comet_logger(workspace, project_name, experiment_name, experiment_key)
    add_artifact_to_experiment(experiment_obj, artifact_name)

    new_experiment, new_experiment_key = get_new_experiment(
        workspace,
        project_name,
        "new_" + experiment_name
    )
    first_comet_logger.experiment.get_artifact(artifact_name)
    print("\n ### that succeeds ### \n")
    
    second_comet_logger = init_comet_logger(workspace, project_name, new_experiment.name, new_experiment_key)
    
    first_comet_logger.experiment.get_artifact(artifact_name)
    print("\n ### this too ### \n")
    
    add_artifact_to_experiment(second_comet_logger.experiment, artifact_name)
    try:
        first_comet_logger.experiment.get_artifact(artifact_name)
    except:
        print("\n ### but this fails! ### \n")

fail_when_two_loggers_live_at_the_same_time()

Error messages and logs

Traceback (most recent call last):
  File "/code_path/get_artifact_from_dead_experiment.py", line 146, in <module>
    fail_when_two_loggers_live_at_the_same_time()
  File "/code_path/get_artifact_from_dead_experiment.py", line 134, in fail_when_two_loggers_live_at_the_same_time
    first_comet_logger.experiment.get_artifact(artifact_name)
  File "/made_up_env_path/lib/python3.11/site-packages/comet_ml/_online.py", line 1097, in get_artifact
    raise ExperimentNotAlive(
comet_ml.exceptions.ExperimentNotAlive: Experiment <comet_ml._online.ExistingExperiment object at 0x7fffd23da910> is not alive, cannot get artifact

Environment

Current environment
#- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow):  Logger (specifically, CometLogger)
#- PyTorch Lightning Version (e.g., 1.5.0): 2.2 (and 2.1)
#- Lightning App Version (e.g., 0.5.2): 2.2 (and 2.1)
#- PyTorch Version (e.g., 2.0): 2.2.1
#- Python version (e.g., 3.9): 3.11
#- OS (e.g., Linux): Linux
#- How you installed Lightning(`conda`, `pip`, source): Conda (Mamba)
#- Running environment of LightningApp (e.g. local, cloud): local

More info

Theory for source of the bug

  1. the Pytorch Lightning CometLogger class has an ._experiment attribute that holds either an Experiment object or an ExistingExperiment object. To reset the active experiment the logger obj is talking to, you can set ._experiment to None (or call the .finalize() utility which does that too).

  2. the CometLogger has a "public" @Attribute method .experiment, which checks first if ._experiment is None, and if so -creates a new Experimet/ExistingExperiment obj behind the scenes.

  3. However, each Experimet/ExistingExperiment also has a boolean "alive" flag, and will only work if it is set to True. The flag can manually be set to false by calling the .end() method, but it also appears that Comet only supports having one alive experiment at a time, so when one Experiment is being interacted with - for example, by logging an artifact to it, the .alive flag of other experiments is set to False

Suggested fix

The source of the problem seems to be that the Lightning logger attempts to maintain a live experiment by toggling an experiment to None once it is done with it - and this is a different mechanism than the one built-in to Comet, which works with a .alive attribute on each instance of an Api object.

Resolving the current and potential future issues can be achieved by using the built-in mechanism instead of the current seperate one. As long as there are two different mechanisms, there is a potential for them becoming out of sync with each other.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingneeds triageWaiting to be triaged by maintainersver: 2.1.x

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions